dataset#

This module concerns the data as observed. DataSet joins information for a Transient (list of true data) and a Survey (what has been observed when). It generates real lightcurves observations.

class skysurvey.dataset.DataSet(data, targets=None, survey=None)[source]#

Bases: object

A class for managing and realistic transient light curves given true data and survey observing logs.

This class provides methods to load, manipulate, and visualize light curve data based on target and survey information.

The classmethod DataSet.from_targets_and_survey() should be favored for loading the dataset.

Parameters:
  • data (pandas.DataFrame) – Multi-index dataframe corresponding to the concatenation of all targets observations.

  • targets (skysurvey.Target or child of, optional) – Target data corresponding to the true target parameters (as given by nature).

  • survey (skysurvey.Survey or child of, optional) – Survey that has been used to generate the dataset (if known).

See also

from_targets_and_survey()

Loads a dataset (observed data) given targets and survey.

read_parquet()

Loads a stored dataset.

classmethod from_targets_and_survey(targets, survey, incl_error=True, phase_range=[-50, 200], progress_bar=False, seed=None, discard_bands=True)[source]#

Loads a dataset (observed data) given targets and a survey.

This first matches the targets (given targets.data[['ra','dec']]) with the survey to find which target has been observed with which field. Then simulate the targets lightcurves given the observing data (survey.data).

Parameters:
  • targets (skysurvey.Target, list, skysurvey.TargetCollection) – Target data corresponding to the true target parameters (as given by nature). Could be a list

  • survey (skysurvey.Survey (or child of)) – Sky observation (what was observed when with which situation).

  • incl_error (bool, optional) – Include error in the lightcurve. If False, the flux is the true model flux.

  • phase_range (list, None, optional) – Rest-frame phase range to be used for simulating the lightcurves. If None, no cut is applied on time range for the logs.

  • progress_bar (bool, optional) – shall this display a progress bar associated to the generation of targets ? (uses tqdm)

  • seed (None, int, Generator, RandomState, optional) – = ignored if incl_error=False = (docstring adapted from np.random.default_rng) If None, a fresh seed will be pulled. If an int, it will be passed to SeedSequence to derive the initial BitGenerator state. Additionally, when passed a (Bit)Generator, it will be returned unaltered. When passed a legacy RandomState instance it will be coerced to a Generator.

  • discard_bands (bool, optional) – If True, discards the bands that includes wavelength for which the (observer-frame) target SED is not defined. This prevents crashing the code due to an error from sncosmo.

Returns:

instance of a DataSet loaded from the given targets.

Return type:

dataset

classmethod read_parquet(parquetfile, survey=None, targets=None, **kwargs)[source]#

Loads a stored dataset.

Only the observation data can be loaded this way, not the survey nor the targets (truth).

Parameters:
  • parquetfile (str) – path to the parquet file containing the dataset (pandas.DataFrame)

  • survey (skysurvey.Survey (or child of), None) – survey that have been used to generate the dataset (if you know it)

  • targets (skysurvey.Target (of child of), None) – target data corresponding to the true target parameters (as given by nature)

  • pandas.read_parquet (**kwargs goes to)

Returns:

with a dataset loaded but maybe no survey nor targets

Return type:

class instance

See also

from_targets_and_survey()

loads a dataset (observed data) given targets and survey

classmethod read_from_directory(dirname, **kwargs)[source]#

Loads a directory containing the dataset, the survey and the targets.

= Not Implemented Yet =

Parameters:

dirname (str) – path to the directory.

Return type:

class instance

See also

from_targets_and_survey()

loads a dataset (observed data) given targets and survey

read_parquet()

loads a stored dataset

set_data(data)[source]#

Lightcurve data as observed by the survey.

= It is unlikely you need to use that directly. =

Parameters:

data (pandas.DataFrame) – multi-index dataframe ((id, observation index)) corresponding the concat of all targets observations

Return type:

None

See also

read_parquet()

loads a stored dataset

set_targets(targets)[source]#

Set the targets.

= It is unlikely you need to use that directly. =

Parameters:

targets (skysurvey.Target (of child of), None) – target data corresponding to the true target parameters (as given by nature)

Return type:

None

See also

from_targets_and_survey()

loads a dataset (observed data) given targets and survey

set_survey(survey)[source]#

set the survey

= It is unlikely you need to use that directly. =

Parameters:

survey (skysurvey.Survey (or child of), None) – survey that have been used to generate the dataset (if you know it)

Return type:

None

See also

from_targets_and_survey()

loads a dataset (observed data) given targets and survey

get_data(add_phase=False, phase_range=None, index=None, redshift_key='z', detection=None, zp=None, join_bandday=False, join_stats='first')[source]#

Tools to access the data with additional tools.

Parameters:
  • add_phase (bool) – should the phase information ‘phase_obs’ (obs-frame), ‘phase’ (rest-frame) be added to the dataframe assuming the input target’s t0 and redshift ?

  • phase_range (array) – min and max phases to be returned. Applied on phase (rest-frame). Setting this sets add_phase to True.

  • index (pandas.Index, list, None) – select the index (targets id) you want.

  • redshift_key (string) –

    name of the redshift column in the dset.targets.data.

    = ignored if add_phase is False =

  • detection (bool, None) –

    should this be limited to (non)detected points only ? This follow the bool/None format:

    • detection=None: no selection

    • detection=False: only non-detected points

    • detection=True: only detected points

  • zp (float) – get the simulated data in the given zp system

  • join_bandday (bool) – if there are multiple observations per band and day (int of mjd) for a given target, should these be joined ? (see join_stat).

  • join_stats (str) – join_bandday is True, how multiple observation should be considered ? (e.g., first).

Return type:

pandas.DataFrame

get_ndetection(phase_range=None, per_band=False, join_bandday=False)[source]#

Get the number of detection for each lightcurves.

Basically computes the number of datapoints with (flux/fluxerr)>detlimit).

Parameters:
  • phase_range (array) – rest-frame phase range to be considered.

  • per_band (bool) – should be computation be made per band ? if true it will then be per target and per band.

  • join_bandday (bool) – if there are multiple observations per band and day (int of mjd) for a given target, should these be joined ? (see join_stat).

Returns:

the number of detected point per target (and per band if per_band=True)

Return type:

pandas.Series

get_target_lightcurve(index, detection=None, phase_range=None)[source]#

Get the observation of the given target.

= short cut to self.get_data(index=index) =

Parameters:
  • index (int, optional) – The index of the target whose light curve is to be taken. If None, a random index is chosen.

  • detection (bool, None) –

    should this be limited to (non)detected points only ? This follow the bool/None format:

    • detection=None: no selection

    • detection=False: only non-detected points

    • detection=True: only detected points

  • phase_range (array) – min and max phases to be returned. Applied on phase (rest-frame). Setting this sets add_phase to True.

Returns:

the lightcurve

Return type:

pandas.DataFrame

show_target_lightcurve(ax=None, fig=None, index=None, zp=25, lc_prop={}, bands=None, show_truth=True, format_time=True, t0_format='mjd', phase_window=None, **kwargs)[source]#

Plot the light curve of a target.

If index is None, a random index will be used. If bands is None, the target’s observed band will be used.

Parameters:
  • ax (matplotlib.axes.Axes, optional) – The axes on which to plot the light curve. If None, a new figure and axes will be created.

  • fig (matplotlib.figure.Figure, optional) – The figure on which to plot the light curve. If None, a new figure will be created.

  • index (int, optional) – The index of the target whose light curve is to be plotted. If None, a random index is chosen.

  • zp (float, optional) – Zero point magnitude for flux conversion. Default is 25.

  • lc_prop (dict, optional) – Additional properties to pass to the light curve plotting function (kwargs).

  • bands (list of str, optional) – The bands to plot. If None, all observed bands for the target will be used.

  • show_truth (bool, optional) – Whether to show the true light curve. Default is True.

  • format_time (bool, optional) – Whether to format the time axis as dates. Default is True.

  • t0_format (str, optional) – The format of the reference time. Default is “mjd”.

  • phase_window (array-like, optional) – The phase window to plot. If None, the entire light curve will be plotted.

  • **kwargs (dict) – Additional keyword arguments to pass to the plotting functions.

Returns:

The figure object containing the light curve plot.

Return type:

matplotlib.figure.Figure

property data#

Lightcurve data as observed by the survey.

property targets#

Target data corresponding to the true target parameters.

property survey#

Survey that has been used to generate the dataset.

property obs_index#

Index of the observed target.