Major update to the package and most package functions with lots of breaking changes.
- new and updated Readme and vignette
- the proposed scoring workflow was reworked. Functions were changed so they can easily be piped and have simplified arguments and outputs.
- the function
eval_forecasts()
was replaced by a functionscore()
with a much reduced set of function arguments. - Functionality to summarise scores and to add relative skill scores was moved
to a function
summarise_scores()
- new function
check_forecasts()
to analyse input data before scoring - new function
correlation()
to compute correlations between different metrics - new function
add_coverage()
to add coverage for specific central prediction intervals - new function
avail_forecasts()
allows to visualise the number of available forecasts - new function
find_duplicates()
to find duplicate forecasts which cause an error - all plotting functions were renamed to begin with
plot_
. Arguments were simplified - the function
pit()
now works based on data.frames. The oldpit
function was renamed topit_sample()
. PIT p-values were removed entirely. - the function
plot_pit()
now works directly with input as produced bypit()
- many data-handling functions were removed and input types for
score()
were restricted to sample-based, quantile-based or binary forecasts. - the function
brier_score()
now returns all brier scores, rather than taking the mean before returning an output. crps
,dss
andlogs
were renamed tocrps_sample()
,dss_sample()
, andlogs_sample()
- Testing was expanded
- minor bugs were fixed, for example a bug in the sample_to_quantile function (epiforecasts#223)
- package data is now based on forecasts submitted to the European Forecast Hub (https://covid19forecasthub.eu/).
- all example data files were renamed to begin with
example_
- a new data set,
summary_metrics
was included that contains a summary of the metrics implemented inscoringutils
- The 'sharpness' component of the weighted interval score was renamed to dispersion. This was done to make it more clear what the component represents and to maintain consistency with what is used in other places.
- now added a function
check_forecasts()
that runs some basic checks on the input data and provides feedback
- minor bug fixes (previously, 'interval_score' needed to be among the selected metrics)
- all data.tables are now returned as
table[]
rather than astable
, such that they don't have to be called twice to display the contents.
- added a function,
pairwise_comparison()
that runs pairwise comparisons between models on the output ofeval_forecasts()
- added functionality to compute relative skill within
eval_forecasts()
- added a function to visualise pairwise comparisons
- The WIS definition change introduced in version 0.1.5 was partly corrected such that the difference in weighting is only introduced when summarising over scores from different interval ranges
- "sharpness" was renamed to 'mad' in the output of [score()] for sample-based forecasts.
eval_forecasts()
can now handle a separate forecast and truth data set as as inputeval_forecasts()
now supports scoring point forecasts along side quantiles in a quantile-based format. Currently the only metric used is the absolute error
- Many functions, especially
eval_forecasts()
got a major rewrite. While functionality should be unchanged, the code should now be easier to maintain - Some of the data-handling functions got renamed, but old names are supported as well for now.
- changed the default definition of the weighted interval score. Previously,
the median prediction was counted twice, but is no only counted once. If you
want to go back to the old behaviour, you can call the interval_score function
with the argument
count_median_twice = FALSE
.
- we added basic plotting functionality to visualise scores. You can now
easily obtain diagnostic plots based on scores as produced by
score
. correlation_plot
shows correlation between metricsplot_ranges
shows contribution of different prediction intervals to some chosen metricplot_heatmap
visualises scores as heatmapplot_score_table
shows a coloured summary table of scores
- renamed "calibration" to "coverage"
- renamed "true_values" to "true_value" in data.frames
- renamed "predictions" to "prediction" in data.frames
- renamed "is_overprediction" to "overprediction"
- renamed "is_underprediction" to "underprediction"
- the by argument in
score
now has a slightly changed meaning. It now denotes the lowest possible grouping unit, i.e. the unit of one observation and needs to be specified explicitly. The default is nowNULL
. The reason for this change is that most metrics need scoring on the observation level and this the most consistent implementation of this principle. The pit function receives its grouping now fromsummarise_by
. In a similar spirit,summarise_by
has to be specified explicitly and e.g. doesn't assume anymore that you want 'range' to be included. - for the interval score,
weigh = TRUE
is now the default option. - (potentially planned) rename true_values to true_value and predictions to prediction.
- updated quantile evaluation metrics in
score
. Bias as well as calibration now take all quantiles into account - Included option to summarise scores according to a
summarise_by
argument inscore
The summary can return the mean, the standard deviation as well as an arbitrary set of quantiles. score
can now return pit histograms.- switched to ggplot2 for plotting
- all scores in score were consistently renamed to lower case. Interval_score is now interval_score, CRPS is now crps etc.
- included support for grouping scores according to a vector of column names
in
score
- included support for passing down arguments to lower-level functions in
score
- included support for three new metrics to score quantiles with
score
: bias, sharpness and calibration
- example data now has a horizon column to illustrate the use of grouping
- documentation updated to explain the above listed changes
- included support for a long as well as wide input formats for
quantile forecasts that are scored with
score
- updated documentation for the
score
- added badges to the Readme