Realistic use of ZTF

Realistic use of ZTF#

Here we show how to use skysurvey to simulate a ZTF DR2-like sample of Type Ia Supernovae. We reproduce the ZTF DR2 observing logs and apply its selection effects to a simulated population of SNe Ia, then compare the redshift distributions of the simulated and observed samples.

We first load the logs from the ZTF survey for the DR2:

import skysurvey

# load the ZTF logs
ztf = skysurvey.ZTF.from_logs()

We then apply the correction coefficients on the g,r and i ZTF filters (from sncosmo) determined in the DR2 release papers (Rigault et al.(2025b) (2025A&A…694A…2R) and Amenouche et al.(2025) (2025A&A…694A…3A)). The original sky noise is stored in skynoise_orig and the corrected values (scaled per each bands with the corresponding factors) are stored back into skynoise.

import pandas

# apply the Rigault+2025(LC) / Amenouche+2025 correction
coefs = {"ztfg": 1.23, 
         "ztfr":1.17, 
         "ztfi": 1.2}

ztf.data["skynoise_orig"] = ztf.data["skynoise"].copy()
s = ztf.data.groupby("band", group_keys=True)["skynoise_orig"]
skynoise = pandas.concat([s.get_group(f_)*coefs[f_] for f_ in coefs.keys()])
ztf.data["skynoise"] = skynoise

The dataframe now contains both skynoise_orig and skynoise:

ztf.data

	mjd	band	fieldid	fieldra	fielddec	rcid	maglimit	zp	gain	expid	infobits	skynoise	skynoise_orig
0	58288.171875	ztfi	375	3.787257	-0.171915	16	19.650887	25.644886	6.2	53417042	0	59.953015	49.960846
1	58288.171875	ztfi	375	3.787257	-0.171915	17	19.612961	25.651960	6.2	53417042	0	62.490079	52.075066
2	58288.171875	ztfi	375	3.787257	-0.171915	18	19.686562	25.683561	6.2	53417042	0	60.118913	50.099094
3	58288.171875	ztfi	375	3.787257	-0.171915	19	19.632156	25.650158	6.2	53417042	0	61.293141	51.077618
4	58288.171875	ztfi	375	3.787257	-0.171915	20	19.573307	25.693308	6.2	53417042	0	67.330458	56.108715
...	...	...	...	...	...	...	...	...	...	...	...	...	...
32808490	59273.558594	ztfr	542	5.338955	0.205076	43	17.924723	26.281721	6.1	151955745	33554440	515.246712	440.381805
32808491	59273.558594	ztfr	542	5.338955	0.205076	44	18.117277	26.246277	6.1	151955745	0	417.653116	356.968475
32808492	59273.558594	ztfr	542	5.338955	0.205076	45	18.143761	26.199760	6.1	151955745	33554432	390.495248	333.756622
32808493	59273.558594	ztfr	542	5.338955	0.205076	46	18.018276	26.206276	6.2	151955745	33554440	440.976813	376.903259
32808494	59273.558594	ztfr	542	5.338955	0.205076	41	17.939053	26.295052	6.2	151955745	8	514.772650	439.976624

32808495 rows × 13 columns

We then import the Milky Way dustmaps, the SFD (Schlegel, Finkbeiner & Davis (1998)), used for the ZTF survey:

import dustmaps.sfd
dustmaps.sfd.fetch()

And simulate SN Ia, including the MW extinction, from the beginning to the end of the ZTF DR2, at the maximum redshift observed.

# load SN Ia data as given by nature
snia = skysurvey.SNeIa.from_draw(tstart="2018-03-01", tstop="2020-12-21", 
                                 zmax=0.18,
                                 effect=skysurvey.Effect.from_name('mw'), 
                                 mwebv=  {"which":"sfd"} # forces map used in ZTF
                                )

We simulate them with the constant following SNe Ia rate, derived from Perley et al.(2020) (2020ApJ…904…35P):

snia.rate

23500.0

A redshift-dependant rate can also be used (see “Change the rate of a target). This gives us, in this example, a sample of 122 861 SNe Ia:

len(snia.data)

With the targets, and the survey, we can obtain the dataset by generating the SNe Ia lightcurves:

# Generate the LC
dset = skysurvey.DataSet.from_targets_and_survey(snia, ztf, progress_bar=True)

100%|██████████| 90855/90855 [12:38<00:00, 119.80it/s]

Not all the 122 866 simulated SNe Ia will have a light curve generated. from_targets_and_survey first performs a geometric match between each SN’s sky position and the ZTF observed fields. SNe Ia that fall outside the ZTF footprint during are simply never observed and are excluded from the dataset. That is why ~25% SNe Ia are lost to sky coverage, so that the sample has now 90 855 SNe Ia with associated light curves.

Moreover, not all the light curves here are relevant for cosmological analyses. We apply the “good sampling” light curve quality cuts determined in Rigault et al.(2025b) and Rigault et al.(2025a) (2025A&A…694A…1R) for the ZTF DR2, which, among other things, define the phase range over which SALT2 fits are well constrained for ZTF DR2:

Full range:
- within the −10 to +40 days rest-frame phase range
- at least seven 5σ flux detections
- in at least 2 bands
- same-night same-band detections are ignored, such that, for example, four detections on the same night in g only account for 1 phase
Pre-max:
- at least two 5σ flux detections
- in at least 2 bands
Post-max:
- at least two 5σ flux detections
- in at least 2 bands

And apply them to our sample:

# Apply the DR2 LC "good sampling" cuts
# full range
data = dset.get_ndetection(phase_range=[-10, 40], per_band=True, join_bandday=True)
minfilters = data.groupby(level=0).size()>= 2
ndetect = data.groupby(level=(0)).sum() >= 7
ok = (minfilters & ndetect)
print(ok.mean())

# pre-max
data = dset.get_ndetection(phase_range=[-10, 0], per_band=True, join_bandday=True)
minfilters = data.groupby(level=0).size()>= 2
ndetect = data.groupby(level=(0)).sum() >= 2
ok = ok & minfilters & ndetect
print(ok.mean())

# post-max
data = dset.get_ndetection(phase_range=[0, 40], per_band=True, join_bandday=True)
minfilters = data.groupby(level=0).size()>= 2
ndetect = data.groupby(level=(0)).sum() >= 2
ok = ok & minfilters & ndetect
print(ok.mean())
print(f"we are left with: {len(ok[ok])/len(snia.data)} Type Ia Supernovae.") 

index_goodlc = ok[ok].index

5590596467684344
4558700184523952
4306118047401088
we are left with: 0.14625471060792278 Type Ia Supernovae.

get_ndetection only returns SNe Ia with at least one detection in the requested phase range, so the fractions are computed over the detected subsample. Of the SNe Ia with generated light curves and at least one detection:

~55% of them pass the cuts on the full range
~45% of them pass the cuts on the pre-max
~43% of them pass the cuts on the post-max

Ultimately, roughly 14% (~17 201) SNe Ia of the original sample pass all the cuts.

Spectroscopic typing is required to confirm a transient as a Type Ia Supernova. In ZTF, this relies on spectra from the SED Machine (SEDM), whose limiting magnitude (~18.5) is another selection effect on the sample. We model this spectroscopic selection (hereafter spectroscopic completeness) as a survival sigmoid function of the SN Ia’s observed peak magnitude:

\[P_\text{spec}(m) = 1 - \mathcal{S}(m;\, m_0, s) = \frac{1}{1 + e^{s(m - m_0)}}\]

where \(m_0\) is the magnitude threshold and \(s\) controls the steepness of the drop-off (see Fig. 5 of Rigault et al. 2025a).

This function defines the probability that a photometrically detected SN Ia was actually targeted for spectroscopic follow-up. (see Rigault et al.(2025a))

import numpy as np
from scipy.special import expit 

def get_spectro_completeness(mag, loc=18.8, scale=4.5): 
    """ Survival sigmoid function.
    
    1- scipy.special.expit( (mag -loc)*scale)
    """
    mag = np.atleast_1d(mag)
    return 1-expit( (mag-loc) * scale)

We start from the simulated SNe Ia that passed the light curve quality cuts.

# get data of SNe Ia with good sampling
data_good_lc = snia.data.loc[index_goodlc].copy()

In addition to the magnitude-dependent spectroscopic completeness, ZTF did not systematically obtain spectra for transients behind the Milky Way plane, where dust extinction is too high. We remove simulated SNe Ia with \(A_V = R_V \times E(B-V) \geq 1\), with \(A_V = 3.5\).

# remove case that were not followed for spectroscopy because of MW
data_used = data_good_lc[(data_good_lc["mwebv"] * 3.1 < 1)].copy()

And so, for all the SNe Ia that have a good light curve, we compute the probability of receiving spectroscopic follow-up based on their observed peak magnitude, using the get_spectro_completeness function. Note that slightly different parameters (\(m_0 = 18.55\), \(s = 3.8\)) are used here, reflecting a refined calibration for this analysis.

# measure the probability to get a spectrum for this target.
data_used["pobs_spec"] = get_spectro_completeness(data_used["magobs"], loc=18.55 , scale=3.8)

Now that each simulated SN Ia has an associated spectroscopic probability \(P_\text{spec}\), we want to transfom the simulated sample into the sample we should’ve observed. We bin the sample in narrow redshift shells (\(\Delta z = 0.01\)) and compute the mean spectroscopic probability \(\langle P_\text{spec} \rangle\) per shell. Within each shell, the magnitude distribution is approximately uniform, so the mean spectroscopic probability \(f_\text{obs} = \langle P_\text{spec} \rangle\) correctly reflects the completeness at that distance.

# Bin per redshift and measure the fraction of SNe Ia that get a spectrum per bin.
zbins = np.arange(0, data_used["z"].max(), step=0.01)
data_used["zbins"] = pandas.cut(data_used["z"], zbins)
pobs = data_used.groupby("zbins", observed=False)["pobs_spec"].mean()

Working bin by bin of redshift is important because at low redshift (\(z \lesssim 0.06\), the volume-limited sample for the ZTF DR2, determined in Rigault et al.(2025a) and Amenouche et al.(2025)), SNe Ia peak between magnitudes 15 and 18.5 (see Figure 5 in Rigault et al.(2025a)), so \(f_\text{obs} \approx 1\). At higher redshift, SNe Ia become fainter on average and \(f_\text{obs}\) drops accordingly, as show in the figure below.

import matplotlib.pyplot as plt

pobs_numeric = pobs.copy()
pobs_numeric.index = pobs.index.map(lambda x: x.mid)

fig, ax = plt.subplots()
ax.plot(pobs_numeric.index, pobs_numeric.values, label=r"$f_\text{spec}$")
ax.axvline(0.06, color="gray", ls="--", label="z = 0.06")
ax.axhline(1.0, color="gray", ls=":", alpha=0.5)
ax.set_xlabel("Redshift")
ax.set_ylabel(r"$f_\text{spec}$")
ax.legend()

<matplotlib.legend.Legend at 0x1fe39df40>

../_images/f8b6693872ac204beddb89766509b92582c22d99d0ec8444e643451640d3dbc7.png

For each shell, a fraction \(f_\text{obs} \) of SNe Ia is drawn, weighted by their individual \(P_\text{spec}\) ( SNe with higher spectroscopic probability are preferentially selected).

# Loop over the redshift bin and get a fraction of the sample 
# corresponding to the fraction SN Ia that have a spectrum in that bin
data_obs = []
for zbin in pobs.index:
    data_of_zbin = data_used[data_used["zbins"] == zbin]
    fobs = data_of_zbin["pobs_spec"].mean()
    data_obs.append(data_of_zbin.sample(frac=fobs, weights="pobs_spec", replace=False))

The retained SNe Ia from all shells are then concatenated into the final sample data_obs, the number of SNe Ia we expect to observe.

# join this redshift binned data back together
data_obs = pandas.concat(data_obs)

print(f"We expect {len(data_obs)} well sampled Type Ia Supernovae.")

We expect 2815 well sampled Type Ia Supernovae.

A simpler equivalent to the redshift-binning method is a direct Bernoulli trial per SNIa, each one kept independently with probability \(P_\text{spec}\):

rng = np.random.default_rng()
data_obs_2 = data_used[rng.uniform(size=len(data_used)) < data_used["pobs_spec"]]
print(f"Simpler method: {len(data_obs_2)} SNe Ia")
print(f"Binned method: {len(data_obs)} SNe Ia")

Simpler method: 2809 SNe Ia
Binned method: 2815 SNe Ia

Both methods are statistically equivalent. The binned approach is preferred here as it more explicitly mirrors the redshift-dependent completeness structure of the real survey.

Data#

We load the real ZTF DR2 data from ztfcosmo with the same light curve quality cuts from the DR2 papers.

import ztfcosmo
datadr2_full = ztfcosmo.get_data(good_coverage=True) 

print(f"We have observed {len(datadr2_full)} well sampled Type Ia Supernovae.")

We have observed 2977 well sampled Type Ia Supernovae.

Plots#

We compare the redshift distributions of the simulated sample data_obs and the real ZTF DR2 data. Good agreement validates skysurvey’s simulations. Error bars show Poisson uncertainty (\(\sqrt{N}\) per bin).

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

prop = dict(bins=50, range=[0, 0.2], histtype="step")

ax.hist(data_obs["z"], label="skysurvey-goodlc", linewidth=0.5, **prop)
ax.hist(datadr2_full["redshift"], label="ztf-goodlc", fill=False, **prop)


# added error bar
npoints, hbins = np.histogram(data_obs["z"], bins=50, range=[0, 0.2])
midbins = np.mean([hbins[:-1], hbins[1:]], axis=0)

ax.errorbar(midbins, npoints,  yerr=np.sqrt(npoints),
            marker="None", ls="None", ms=1,
           color="tab:blue")

ax.legend()

<matplotlib.legend.Legend at 0x208314ce0>

../_images/0398224eb5ac8b811c608f5faf3eb13e881b40951e2f712aafa1947f93acc4fc.png

Realistic use of ZTF

Contents

Realistic use of ZTF#

Data#

Plots#