dataset#
This module concerns the data as observed. DataSet joins information for a Transient (list of true data) and a Survey (what has been observed when). It generates real lightcurves observations.
- class skysurvey.dataset.DataSet(data, targets=None, survey=None)[source]#
Bases:
objectA class for managing and realistic transient light curves given true data and survey observing logs.
This class provides methods to load, manipulate, and visualize light curve data based on target and survey information.
The classmethod
DataSet.from_targets_and_survey()should be favored for loading the dataset.- Parameters:
data (pandas.DataFrame) – Multi-index dataframe corresponding to the concatenation of all targets observations.
targets (
skysurvey.Targetor child of, optional) – Target data corresponding to the true target parameters (as given by nature).survey (
skysurvey.Surveyor child of, optional) – Survey that has been used to generate the dataset (if known).
See also
from_targets_and_survey()Loads a dataset (observed data) given targets and survey.
read_parquet()Loads a stored dataset.
- classmethod from_targets_and_survey(targets, survey, incl_error=True, phase_range=[-50, 200], progress_bar=False, seed=None, discard_bands=True)[source]#
Loads a dataset (observed data) given targets and a survey.
This first matches the targets (given
targets.data[['ra','dec']]) with the survey to find which target has been observed with which field. Then simulate the targets lightcurves given the observing data (survey.data).- Parameters:
targets (
skysurvey.Target, list,skysurvey.TargetCollection) – Target data corresponding to the true target parameters (as given by nature). Could be a listsurvey (
skysurvey.Survey(or child of)) – Sky observation (what was observed when with which situation).incl_error (bool, optional) – Include error in the lightcurve. If False, the flux is the true model flux.
phase_range (list, None, optional) – Rest-frame phase range to be used for simulating the lightcurves. If None, no cut is applied on time range for the logs.
progress_bar (bool, optional) – shall this display a progress bar associated to the generation of targets ? (uses tqdm)
seed (None, int, Generator, RandomState, optional) – = ignored if incl_error=False = (docstring adapted from
np.random.default_rng) If None, a fresh seed will be pulled. If an int, it will be passed to SeedSequence to derive the initial BitGenerator state. Additionally, when passed a (Bit)Generator, it will be returned unaltered. When passed a legacy RandomState instance it will be coerced to a Generator.discard_bands (bool, optional) – If True, discards the bands that includes wavelength for which the (observer-frame) target SED is not defined. This prevents crashing the code due to an error from sncosmo.
- Returns:
instance of a DataSet loaded from the given targets.
- Return type:
dataset
- classmethod read_parquet(parquetfile, survey=None, targets=None, **kwargs)[source]#
Loads a stored dataset.
Only the observation data can be loaded this way, not the survey nor the targets (truth).
- Parameters:
parquetfile (str) – path to the parquet file containing the dataset (pandas.DataFrame)
survey (
skysurvey.Survey(or child of), None) – survey that have been used to generate the dataset (if you know it)targets (
skysurvey.Target(of child of), None) – target data corresponding to the true target parameters (as given by nature)pandas.read_parquet (**kwargs goes to)
- Returns:
with a dataset loaded but maybe no survey nor targets
- Return type:
class instance
See also
from_targets_and_survey()loads a dataset (observed data) given targets and survey
- classmethod read_from_directory(dirname, **kwargs)[source]#
Loads a directory containing the dataset, the survey and the targets.
= Not Implemented Yet =
- Parameters:
dirname (str) – path to the directory.
- Return type:
class instance
See also
from_targets_and_survey()loads a dataset (observed data) given targets and survey
read_parquet()loads a stored dataset
- set_data(data)[source]#
Lightcurve data as observed by the survey.
= It is unlikely you need to use that directly. =
- Parameters:
data (pandas.DataFrame) – multi-index dataframe ((id, observation index)) corresponding the concat of all targets observations
- Return type:
None
See also
read_parquet()loads a stored dataset
- set_targets(targets)[source]#
Set the targets.
= It is unlikely you need to use that directly. =
- Parameters:
targets (
skysurvey.Target(of child of), None) – target data corresponding to the true target parameters (as given by nature)- Return type:
None
See also
from_targets_and_survey()loads a dataset (observed data) given targets and survey
- set_survey(survey)[source]#
set the survey
= It is unlikely you need to use that directly. =
- Parameters:
survey (
skysurvey.Survey(or child of), None) – survey that have been used to generate the dataset (if you know it)- Return type:
None
See also
from_targets_and_survey()loads a dataset (observed data) given targets and survey
- get_data(add_phase=False, phase_range=None, index=None, redshift_key='z', detection=None, zp=None, join_bandday=False, join_stats='first')[source]#
Tools to access the data with additional tools.
- Parameters:
add_phase (bool) – should the phase information ‘phase_obs’ (obs-frame), ‘phase’ (rest-frame) be added to the dataframe assuming the input target’s t0 and redshift ?
phase_range (array) – min and max phases to be returned. Applied on phase (rest-frame). Setting this sets add_phase to True.
index (pandas.Index, list, None) – select the index (targets id) you want.
redshift_key (string) –
- name of the redshift column in the dset.targets.data.
= ignored if add_phase is False =
detection (bool, None) –
should this be limited to (non)detected points only ? This follow the bool/None format:
detection=None: no selection
detection=False: only non-detected points
detection=True: only detected points
zp (float) – get the simulated data in the given zp system
join_bandday (bool) – if there are multiple observations per band and day (int of mjd) for a given target, should these be joined ? (see join_stat).
join_stats (str) – join_bandday is True, how multiple observation should be considered ? (e.g., first).
- Return type:
pandas.DataFrame
- get_ndetection(phase_range=None, per_band=False, join_bandday=False)[source]#
Get the number of detection for each lightcurves.
Basically computes the number of datapoints with (flux/fluxerr)>detlimit).
- Parameters:
phase_range (array) – rest-frame phase range to be considered.
per_band (bool) – should be computation be made per band ? if true it will then be per target and per band.
join_bandday (bool) – if there are multiple observations per band and day (int of mjd) for a given target, should these be joined ? (see join_stat).
- Returns:
the number of detected point per target (and per band if per_band=True)
- Return type:
pandas.Series
- get_target_lightcurve(index, detection=None, phase_range=None)[source]#
Get the observation of the given target.
= short cut to self.get_data(index=index) =
- Parameters:
index (int, optional) – The index of the target whose light curve is to be taken. If None, a random index is chosen.
detection (bool, None) –
should this be limited to (non)detected points only ? This follow the bool/None format:
detection=None: no selection
detection=False: only non-detected points
detection=True: only detected points
phase_range (array) – min and max phases to be returned. Applied on phase (rest-frame). Setting this sets add_phase to True.
- Returns:
the lightcurve
- Return type:
pandas.DataFrame
- show_target_lightcurve(ax=None, fig=None, index=None, zp=25, lc_prop={}, bands=None, show_truth=True, format_time=True, t0_format='mjd', phase_window=None, **kwargs)[source]#
Plot the light curve of a target.
If index is None, a random index will be used. If bands is None, the target’s observed band will be used.
- Parameters:
ax (matplotlib.axes.Axes, optional) – The axes on which to plot the light curve. If None, a new figure and axes will be created.
fig (matplotlib.figure.Figure, optional) – The figure on which to plot the light curve. If None, a new figure will be created.
index (int, optional) – The index of the target whose light curve is to be plotted. If None, a random index is chosen.
zp (float, optional) – Zero point magnitude for flux conversion. Default is 25.
lc_prop (dict, optional) – Additional properties to pass to the light curve plotting function (kwargs).
bands (list of str, optional) – The bands to plot. If None, all observed bands for the target will be used.
show_truth (bool, optional) – Whether to show the true light curve. Default is True.
format_time (bool, optional) – Whether to format the time axis as dates. Default is True.
t0_format (str, optional) – The format of the reference time. Default is “mjd”.
phase_window (array-like, optional) – The phase window to plot. If None, the entire light curve will be plotted.
**kwargs (dict) – Additional keyword arguments to pass to the plotting functions.
- Returns:
The figure object containing the light curve plot.
- Return type:
matplotlib.figure.Figure
- property data#
Lightcurve data as observed by the survey.
- property targets#
Target data corresponding to the true target parameters.
- property survey#
Survey that has been used to generate the dataset.
- property obs_index#
Index of the observed target.