-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmetrics.tex
112 lines (70 loc) · 16.8 KB
/
metrics.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
\section{Metrics}
There are many options for evaluating the output of the survey strategy experiments, including high-level science-oriented metrics and more basic metrics measuring simple changes in survey characteristics. One of the primary goals for the LSST Metrics Analysis Framework (MAF) package was to make it easier for both the project and community members to write metrics to evaluate these outputs. This effort has had some significant successes; SRD-level metrics have been written that cover the primary requirements for the SRD, the DESC working groups have made good progress in writing metrics for their evaluation of the simulations, and the Solar System collaboration has contributed substantial metrics. In other areas, it has been more difficult for the community to engage and contribute directly to MAF; for some of these areas, we have been able to help get metrics running, but clearly there are areas which are lacking definitive metrics. Many of the areas which are lacking relate directly to time domain studies, a critical area for the LSST (we do have simple periodic variable detection and period determination metrics, as well as SNIa, TDE and microlensing metrics, and the ability to quickly generalize these metrics for other kinds of transient lightcurves). We acknowledge this issue and look forward to working with the community to address this, and the larger concern of potentially important, but missing, metrics.
Here we make a brief summary of some of the top-level science-related metrics. There are thousands of metrics which are run as part of standard MAF analysis (many of which relate to simple analysis of observation metadata like seeing, airmass, sky brightness, etc.); for broad comparisons between simulations we pick a very limited subset of these metrics intended to discover or highlight differences between the simulation survey strategies or to cover major areas of science. All of these metrics are available in github repos in either \href{https://github.com/lsst/sims_maf/}{\tt sims\_maf} or \href{https://github.com/LSST-nonproject/sims_maf_contrib}{sims\_maf\_contrib}. We also describe the `radar plots' (the 2-d plots holding representations of these metric results across multiple simulations) at the end of this section.
\subsection{SRD Metrics}
The SRD metrics are designed to cover the primary science requirements laid out in the SRD; the most relevant of these relate to the number of visits per pointing across the WFD region, the area of the WFD region, the parallax and proper motion errors and the number of rapid revisits (on timescales between a 40 seconds to 30 minutes) per point on the sky. While we check all of these metrics for all runs, the most sensitive to changes in the survey strategy is the number of visits across the WFD, tracked in the fO metric, since we are often attempting to distribute visits into other parts of the sky for other science.
The fO metric calculates the total number of visits per point on the sky, then calculates how much area is covered with how many visits. This can be summarized across two axes; the amount of area that receives at least 825 visits per pointing (`fO Area') or the median (or minimum) number of visits that the most frequently visited 18,000 square degrees receives (`fONv MedianNvisits' or `fONv MinNvisits'). The first version, fO Area, tends to be somewhat unstable; the survey hardly ever observes more than 18,000 square degrees to at least 825 visits, because we don't program in larger WFD areas, but if the number of visits across the WFD area falls below 825, the resulting fO Area value will fall rapidly (because we cover the sky uniformly). While fO Area is useful to check, a more useful number is fONv MedianNvisits or MinNvisits. The value of fONv MedianNvisits tells us how many visits the typical field in the top 18,000 square degrees receives; fONv MinimumNvisits tells us the fewest number of visits any of those top 18,000 square degrees received. Typically we see fONv MedianNvisits scales more smoothly with the fraction of visits devoted to WFD and likely represents science metrics that depend on having a reasonably large amount of visits over the entire WFD well.
The radar plots use fONv MedianNvisits, the Median Parallax Error, and the Median Proper Motion Error. The astrometry metrics both assume an $r$=20 magnitude star with a flat SED. When plotted in radar plots, we compare the reciprical of the astrometry uncertainties so that larger values on the radar plot can always be interpreted as ``better".
\subsection{Solar System Science Metrics}
Solar System science metrics include \href{https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/moMetrics.py#L215}{discovery metrics} (with various discovery criteria, such as detections in 3 nights with pairs of visits within a 15 night window) and characterization metrics (ie. how many colors for objects can we measure, and can we determine a light curve or even shape measurement from the lightcurve), contributed by both project and science collaboration. The most important metric for solar system objects is discovery; finding the objects is the first priority. Characterization metrics are secondary metrics. For each of these metrics, we generate input observations using an appropriate solar system population: Potentially Hazardous Asteroids (PHAs) and Near Earth Objects (NEOs) based on a model by \citet{2018Icar..312..181G}, Main Belt Asteroids (MBAs) and Jovian Trojans based on the S3M model from \citet{2011PASP..123..423G}, and TransNeptunian Objects (TNOs) based on the L7 model from the Canada France Ecliptic Plane Survey (CFEPS) \citep{2009AJ....137.4917K, 2011AJ....142..131P}. These populations move at varying rates and cover varying amounts of the sky. NEOs move over much of the sky during the lifetime of the survey, so are less sensitive to footprint variations, but tend to have much more strongly varying brightnesses, thus are sensitive to the number and timing of visits (must be observed when they are bright). TNOs move very slowly, not more than a few fields of view over the lifetime of the survey, so are quite sensitive to footprint, however they are relatively consistent in their brightness; thus they are less sensitive to the overall number of visits at a particular point in the sky, once a threshold has been met.
For each of these populations, we calculate the population completeness due to discovery with the LSST at the end of 10 years (not including previous surveys) with the currently Moving Object Pipeline baseline criteria; 3 nights with pairs of visits within 15 nights at a range of absolute magnitude $H$ (approximately the size of the object) and then take the completeness at an $H$ value near peak completeness and an $H$ value that is relatively close to 50\% completeness in the baseline; these completeness values are the summary metrics we track across various runs to compare them here.
The radar plots use the completeness for bright (H=16) NEOs, faint (H=22) NEOs, and bright (H=4) TNOs.
\subsection{Number of Stars}
We use a simulated MW stellar catalog from Galfast \citep{2008ApJ...673..864J, 2018ascl.soft10001J} along with the survey coadded depth to estimate the number of stars that would be detected at the 5$\sigma$\ level in $i$. Comparison with the TRILEGAL galaxy model \citep{2005A&A...436..895G, 2012ASSP...26..165G} gives similar results.
The number of stars is primarily sensitive to the footprint definition, and decreases dramatically (by about a factor of 2) when there is no coverage of the galactic plane. An extended survey footprint (such as increased coverage toward the north) also increases the number of stars. Because we are only computing the metric in the $i$\ filter, the metric is also sensitive to the depth in $i$, and thus we can see some variation if, e.g., a simulation pushes more $i$\ observations to twilight time.
The radar plots use the total number of stars over the entire footprint down to the coadded limiting magnitude. We do not include a crowding correction.
\subsection{Tidal Disruption Events (TDE)}
\begin{figure}
\epsscale{0.5}
\plotone{plots/tde_lc}
\epsscale{1}
\caption{Simulated TDE lightcurve shapes.}\label{fig:tdelc}
\end{figure}
We use TDE lightcurves from the community to generate a sample of TDE events distributed uniformly on the sky and uniformly over the 10 year survey. Figure~\ref{fig:tdelc} shows the lightcurve shapes. When analyzing a detected light curve, we test three crtiera
\begin{itemize}
\item{If it is detected twice pre-peak in any filters}
\item{If there is one detection pre-peak, and detections in at least 3 filters within 10 days of peak}
\item{If there is one detection pre-peak, one detection in u and any other band near peak, and u plus any other filter post-peak.}
\end{itemize}
When requiring both a color in any filter and $u$ band measurements during the TDE event, this metric is exceedingly sensitive to the number and cadence of $u$ band visits, with the number of detected TDEs scaling linearly with the number of $u$ band visits and preferring visits spread more uniformly over time. In other configurations, when requiring observations pre-peak or just a color in any filters, it is primarily sensitive to the frequency of observations and whether pairs are obtained in the same or mixed filters.
The radar plot uses the TDE some color plus $u$ band metric output.
\subsection{Fast Microlensing}
We use microlensing light curves contributed from the community. For all the events, we assume an $r$=22 magnitude star with a flat SED is being magnified.
We calculate both a Fast (crossing times of 1-10 days) and Slow (crossing times 100-1,500 days) microlensing metric. They are distributed on the sky proportionally to stellar density squared as measured from TRILEGAL galaxy model. Due to this spatial distribution, both the Slow and Fast microlensing metrics are primarily sensitive to survey footprint. Footprints without galactic plane coverage cut the number of detected microlenses by approximately 75\% while footprints with heavy galactic plane coverage can increase the number of microlenses by a factor of 2 or more. The Fast microlensing metric is also sensitive to the number and cadence of $u$ band visits, preferring $u$ band visits spread more uniformly over time.
The radar plot uses the Fast microlensing metric. We find the slow microlensing events are so slow they are detected at a very high rate regardless of survey strategy.
\subsection{Number of Galaxies}
The estimated expected number of galaxies, across the entire survey footprint, is calculated using \href{https://github.com/LSST-nonproject/sims_maf_contrib/blob/master/mafContrib/LSSObsStrategy/galaxyCountsMetric_extended.py#L26}{GalaxyCountsMetric\_extended}, from \href{https://github.com/LSST-nonproject/sims_maf_contrib}{sims\_maf\_contrib}. The number of galaxies is estimated based on the coadded depth using redshift-bin-specific powerlaws, based on mock catalogs; the construction of these power laws and normalization is described in \citet{2016ApJ...829...50A}. The overall number of galaxies tends to increase with increased depth, and more so when more of the survey footprint is distributed in lower dust extinction areas. The number of galaxies also increases when the survey filter distribution is redder, rather than bluer.
The radar plots use the total number of galaxies down to the coadded limiting magnitude over the entire survey footprint.
\subsection{DESC WFD Metrics}
The DESC has contributed several metrics evaluating the performance of the WFD for various areas of relevant science. Many of these metrics are built on calculating a subset of the survey footprint that meets the requirements of coverage in all 6 filters, less than a specified level of dust extinction (E(B-V) $<$ 0.2) and greater than a specified coadded depth in $i$ band ($i$ $>$ 25.9 at 10 years), calculated using \href{https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/weakLensingSystematicsMetric.py#L8}{ExgalM5\_with\_cuts}. This represents the extragalactic science footprint.
\subsubsection{Static Science}
Over this extragalactic footprint the following metrics are calculated for general `static science'.
\begin{itemize}
\item Median coadded depth in $i$ band
\item Standard deviation of the coadded depth in $i$ band
\item The area of the selected footprint
\item A \href{https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/summaryMetrics.py#L231}{3x2 point Figure of Merit} emulator
\end{itemize}.
These metrics are very sensitive to footprint coverage and depth, as well as desiring uniformity in the coadded depth to minimize corrections during later analysis.
The radar plot uses the 3x2point FoM.
\subsubsection{Weak Lensing}
The same footprint is used to calculate the number of visits per point in the footprint (\href{https://github.com/lsst/sims_maf/blob/master/python/lsst/sims/maf/metrics/weakLensingSystematicsMetric.py#L59}{WeakLensingNvisits}); this is used as an approximate metric evaluating weak lensing systematics.This metric is sensitive to footprint coverage and depth.
The radar plot uses the mean number of visits across the extragalactic footprint.
\subsubsection{Large Scale Structure}
The number of galaxies within this same footprint is used as a metric to approximate large scale structure results (DepthLimitedNumGalaxies), using the same GalaxyCountsMetric\_extended as above, but limiting the result to the selected footprint.
\subsubsection{SNe Ia}
We use SNe Ia light curves from the PLAsTiCC challenge. SNe are distributed uniformly on the sky.
For each supernova, we check:
\begin{enumerate}
\item{Is the supernova detected in any filter?}
\item{Is there a color detected (detected in 2 filters within 0.5 days)?}
\item{Is it possible to measure the rise slope (detect an increase of 0.3 mags in a filter pre-peak)?}
\item{Is the light curve "well sampled" (if the light curve duration is divided into tenths, are there detections in 5 unique bins)?}
\end{enumerate}
There are several versions of this metric, using different criteria for observations. We call the supernova ``Detected" if it meets criteria 1, it is ``Pre-peak" if it meets criteria 2 and 3, and is ``Well-sampled" if it meets criteria 4.
The simple Detected metric is sensitive to a combination of survey footprint and number of visits, preferring more area as long as a minimum number of visits spaced uniformly over time are available. The Pre-peak metric, which may be more useful for detections before follow-up, is most sensitive to requiring visits to be obtained with mixed filter pairs, with a lesser preference for visits being spaced more evenly over time (such as in non-rolling cadences). The Well-sampled metric is primarily sensitive to the cadence of visits, preferring visits to be spaced evenly over time.
The radar plot uses the metric which demands criteria 2 and 3 from above. Thus, we are mostly measuring how well we are producing SNe alerts that can act as triggers for others to follow up. The DESC group has developed metrics for measuring how well SNe are observed by Rubin alone, and we will be incorporating these into MAF soon.
\subsection{Radar Plots}
To help compare multiple science and SRD metrics across runs, we make use of radar plots, a 2-dimensional plot containing metric results along various, radially-distributed axes. In each radar plot, we typically normalize values to a baseline run and plot the fractional change in metric values in the radial direction. For metrics that are measured over the entire sky (e.g., Parallax, Proper Motion, Weak Lensing), we use the median. For the parallax and proper motion metrics, the inverse of the errors are compared. When there are particularly large changes in metrics, we will generate a pair of radar plots with different radial ranges to make comparisons easier.
In some cases we make radar plots of the median coadded depth in each filter. For coadded depth, we plot magnitude difference in the radial direction, with larger values indicating deeper depths.
Almost all of the metrics in the radar plot show highly statistically significant changes as the survey strategy changes. It is worth remembering that the simulations themselves have some level of uncertainty, as change in the `real-life' weather or status of the observatory will lead to changes in the observing history and then changes in the scheduler choices to maintain the overall survey strategy guidelines. For some metrics with particularly small measured values, these small differences between runs can result in large metric differences, effectively making the metric results somewhat noisy. The TDE metric is one of these such metric; out of 10,000 simulated TDE lightcurves, the baseline run only observes about 200 of these with visits that meet the `some color plus $u$' criteria, thus implying that variations of up to about 7\% in the TDE metric value can be expected even if the simulations are statistically similar.