Conformal Tights is a Python package that exports:
- a scikit-learn meta-estimator that adds conformal prediction of coherent quantiles and intervals to any scikit-learn regressor
- a Darts forecaster that adds conformally calibrated probabilistic time series forecasting to any scikit-learn regressor
- 🍬 Sklearn meta-estimator: add conformal prediction of quantiles and intervals to any scikit-learn regressor
- 🔮 Darts forecaster: add conformally calibrated probabilistic forecasting to any scikit-learn regressor
- 🌡️ Conformally calibrated: accurate quantiles, and intervals with reliable coverage
- 🚦 Coherent quantiles: quantiles increase monotonically instead of crossing each other
- 👖 Tight quantiles: selects the lowest dispersion that provides the desired coverage
- 🎁 Data efficient: requires only a small number of calibration examples to fit
- 🐼 Pandas support: optionally predict on DataFrames and receive DataFrame output
pip install conformal-tights
Conformal Tights exports a meta-estimator called ConformalCoherentQuantileRegressor
that you can use to equip any scikit-learn regressor with a predict_quantiles
method that predicts conformally calibrated quantiles. Example usage:
from conformal_tights import ConformalCoherentQuantileRegressor
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Fetch dataset and split in train and test
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
# Create a regressor, equip it with conformal prediction, and fit on the train set
my_regressor = XGBRegressor(objective="reg:absoluteerror")
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
conformal_predictor.fit(X_train, y_train)
# Predict with the underlying regressor
ŷ_test = conformal_predictor.predict(X_test)
# Predict quantiles with the conformal predictor
ŷ_test_quantiles = conformal_predictor.predict_quantiles(
X_test, quantiles=(0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975)
)
When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_quantiles
yields:
house_id | 0.025 | 0.05 | 0.1 | 0.5 | 0.9 | 0.95 | 0.975 |
---|---|---|---|---|---|---|---|
1357 | 114743.7 | 120917.9 | 131752.6 | 156708.2 | 175907.8 | 187996.1 | 205443.4 |
2367 | 67382.7 | 80191.7 | 86871.8 | 105807.1 | 118465.3 | 127581.2 | 142419.1 |
2822 | 119068.0 | 131864.8 | 138541.6 | 159447.7 | 179227.2 | 197337.0 | 214134.1 |
2126 | 93885.8 | 100040.7 | 111345.5 | 134292.7 | 150557.1 | 164595.8 | 182524.1 |
1544 | 68959.8 | 81648.8 | 88364.1 | 108298.3 | 122329.6 | 132421.1 | 147225.6 |
Let's visualize the predicted quantiles on the test set:
Expand to see the code that generated the graph above
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%config InlineBackend.figure_format = "retina"
plt.rc("font", family="DejaVu Sans", size=10)
plt.figure(figsize=(8, 4.5))
idx = ŷ_test_quantiles[0.5].sample(50, random_state=42).sort_values().index
x = list(range(1, len(idx) + 1))
x_ticks = [1, *list(range(5, len(idx) + 1, 5))]
for j in range(3):
coverage = round(100 * (ŷ_test_quantiles.columns[-(j + 1)] - ŷ_test_quantiles.columns[j]))
plt.bar(
x,
ŷ_test_quantiles.loc[idx].iloc[:, -(j + 1)] - ŷ_test_quantiles.loc[idx].iloc[:, j],
bottom=ŷ_test_quantiles.loc[idx].iloc[:, j],
color=["#b3d9ff", "#86bfff", "#4da6ff"][j],
label=f"{coverage}% Prediction interval",
)
plt.plot(
x,
y_test.loc[idx],
"s",
label="Actual (test)",
markeredgecolor="#e74c3c",
markeredgewidth=1.414,
markerfacecolor="none",
markersize=4,
)
plt.plot(x, ŷ_test.loc[idx], "s", color="blue", label="Predicted (test)", markersize=2)
plt.xlabel("House")
plt.xticks(x_ticks, x_ticks)
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"${x/1000:,.0f}k"))
plt.gca().tick_params(axis="both", labelsize=10)
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.grid(False)
plt.grid(axis="y")
plt.legend(loc="upper left", title="House price", title_fontproperties={"weight": "bold"})
plt.tight_layout()
In addition to quantile prediction, you can use predict_interval
to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:
# Predict an interval for each example with the conformal predictor
ŷ_test_interval = conformal_predictor.predict_interval(X_test, coverage=0.95)
# Measure the coverage of the prediction intervals on the test set
coverage = ((ŷ_test_interval.iloc[:, 0] <= y_test) & (y_test <= ŷ_test_interval.iloc[:, 1])).mean()
print(coverage) # 96.6%
When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_interval
yields:
house_id | 0.025 | 0.975 |
---|---|---|
1357 | 107202.8 | 206290.4 |
2367 | 66665.1 | 146004.8 |
2822 | 115591.8 | 220314.8 |
2126 | 85288.1 | 183037.8 |
1544 | 67889.9 | 150646.2 |
Conformal Tights also exports a Darts forecaster called DartsForecaster
that uses a ConformalCoherentQuantileRegressor
to make conformally calibrated probabilistic time series forecasts. To demonstrate its usage, let's begin by loading a time series dataset:
from darts.datasets import ElectricityConsumptionZurichDataset
# Load a forecasting dataset
ts = ElectricityConsumptionZurichDataset().load()
ts = ts.resample("h")
# Split the dataset into covariates X and target y
X = ts.drop_columns(["Value_NE5", "Value_NE7"])
y = ts["Value_NE5"] # NE5 = Household energy consumption
# Add categorical covariates to X
X = X.add_holidays(country_code="CH")
X = X.add_datetime_attribute("month")
X = X.add_datetime_attribute("dayofweek")
X = X.add_datetime_attribute("hour")
X_categoricals = ["holidays", "month", "dayofweek", "hour"]
Printing the tail of the covariates time series X.pd_dataframe()
yields:
Timestamp | Hr [%Hr] | RainDur [min] | StrGlo [W/m2] | T [°C] | WD [°] | WVs [m/s] | WVv [m/s] | p [hPa] | holidays | month | dayofweek | hour |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2022‑08‑30 20h | 70.2 | 0.0 | 0.0 | 19.9 | 290.2 | 1.7 | 1.5 | 968.5 | 0.0 | 7.0 | 1.0 | 20.0 |
2022‑08‑30 21h | 70.1 | 0.0 | 0.0 | 19.5 | 239.2 | 1.0 | 0.7 | 968.1 | 0.0 | 7.0 | 1.0 | 21.0 |
2022‑08‑30 22h | 71.3 | 0.0 | 0.0 | 19.5 | 28.9 | 1.5 | 1.3 | 967.9 | 0.0 | 7.0 | 1.0 | 22.0 |
2022‑08‑30 23h | 80.4 | 0.0 | 0.0 | 18.9 | 24.3 | 1.6 | 1.1 | 967.9 | 0.0 | 7.0 | 1.0 | 23.0 |
2022‑08‑31 00h | 81.6 | 1.0 | 0.0 | 18.7 | 293.5 | 0.9 | 0.3 | 967.8 | 0.0 | 7.0 | 2.0 | 0.0 |
We can now equip a scikit-learn regressor with conformal prediction using ConformalCoherentQuantileRegressor
as before, and then equip that conformal predictor with probabilistic time series forecasting using DartsForecaster
:
from conformal_tights import DartsForecaster, ConformalCoherentQuantileRegressor
from pandas import Timestamp
from xgboost import XGBRegressor
# Split the dataset into train and test
test_cutoff = Timestamp("2022-06-01")
y_train, y_test = y.split_after(test_cutoff)
X_train, X_test = X.split_after(test_cutoff)
# Now let's:
# 1. Create an sklearn regressor of our choosing, in this case `XGBRegressor`
# 2. Add conformal quantile prediction to the regressor with `ConformalCoherentQuantileRegressor`
# 3. Add probabilistic forecasting to the conformal predictor with `DartsForecaster`
my_regressor = XGBRegressor()
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
forecaster = DartsForecaster(
model=conformal_predictor,
lags=5 * 24, # Add the last 5 days of the target to the prediction features
lags_future_covariates=[0], # Add the current timestamp's covariates to the prediction features
categorical_future_covariates=X_categoricals, # Convert these covariates to pd.Categorical
)
# Fit the forecaster
forecaster.fit(y_train, future_covariates=X_train)
# Make a probabilistic forecast 5 days into the future by predicting a set of conformally calibrated
# quantiles at each time step and drawing 500 samples from them
quantiles = (0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975)
forecast = forecaster.predict(
n=5 * 24, future_covariates=X_test, num_samples=500, quantiles=quantiles
)
Printing the head of the forecast quantiles time series forecast.quantiles_df(quantiles=quantiles)
yields:
Timestamp | Value_NE5_0.025 | Value_NE5_0.05 | Value_NE5_0.1 | Value_NE5_0.25 | Value_NE5_0.5 | Value_NE5_0.75 | Value_NE5_0.9 | Value_NE5_0.95 | Value_NE5_0.975 |
---|---|---|---|---|---|---|---|---|---|
2022‑06‑01 01h | 19165.2 | 19268.3 | 19435.7 | 19663.0 | 19861.7 | 20062.2 | 20237.9 | 20337.7 | 20453.2 |
2022‑06‑01 02h | 19004.0 | 19099.0 | 19226.3 | 19453.7 | 19710.7 | 19966.1 | 20170.1 | 20272.8 | 20366.9 |
2022‑06‑01 03h | 19372.6 | 19493.0 | 19679.4 | 20027.6 | 20324.6 | 20546.3 | 20773.2 | 20910.3 | 21014.1 |
2022‑06‑01 04h | 21936.2 | 22105.6 | 22436.0 | 22917.5 | 23308.6 | 23604.8 | 23871.0 | 24121.7 | 24351.5 |
2022‑06‑01 05h | 25040.5 | 25330.5 | 25531.1 | 25910.4 | 26439.4 | 26903.2 | 27287.4 | 27493.9 | 27633.9 |
Let's visualize the forecast and its prediction interval on the test set:
Expand to see the code that generated the graph above
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%config InlineBackend.figure_format = "retina"
plt.rc("font", family="DejaVu Sans", size=10)
plt.figure(figsize=(8, 4.5))
y_train[-2 * 24 :].plot(label="Actual (train)")
y_test[: len(forecast)].plot(label="Actual (test)")
forecast.plot(label="Forecast with\n90% Prediction interval", low_quantile=0.05, high_quantile=0.95)
plt.gca().set_xlabel("")
plt.gca().yaxis.set_major_formatter(ticker.FuncFormatter(lambda x, _: f"{x/1000:,.0f} MWh"))
plt.gca().tick_params(axis="both", labelsize=10)
plt.legend(loc="upper right", title="Energy consumption", title_fontproperties={"weight": "bold"})
plt.tight_layout()
Prerequisites
1. Set up Git to use SSH
-
Generate an SSH key and add the SSH key to your GitHub account.
-
Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes ForwardAgent yes EOF
2. Install Docker
- Install Docker Desktop.
- Linux only:
-
Export your user's user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
-
- Linux only:
3. Install VS Code or PyCharm
- Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
- Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments
The following development environments are supported:
- ⭐️ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
- ⭐️ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
- Dev Container: clone this repository, open it with VS Code, and run Ctrl/⌘ + ⇧ + P → Dev Containers: Reopen in Container.
- PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
dev
service. - Terminal: clone this repository, open it with your terminal, and run
docker compose up --detach dev
to start a Dev Container in the background, and then rundocker compose exec dev zsh
to open a shell prompt in the Dev Container.
Developing
- This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
- Run
poe
from within the development environment to print a list of Poe the Poet tasks available to run on this project. - Run
poetry add {package}
from within the development environment to install a run time dependency and add it topyproject.toml
andpoetry.lock
. Add--group test
or--group dev
to install a CI or development dependency, respectively. - Run
poetry update
from within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml
. - Run
cz bump
to bump the package's version, update theCHANGELOG.md
, and create a git tag.