This repository contains methods for leveraging large scale machine learning predictions in downstream statistical inference when a small number of labelled calibration samples are available. The package has functions that implement the Predict-Then-Debias bootstrap algorithm from the paper referenced below.
See the "ptd.py" file for the functions that can be used to run the Predict-Then-Debias bootstrap and the "Predict-Then-Debias examples.ipynb" file for four examples demonstrating how to use these functions.
This software corresponds to Algorithms 1 and 2 in:
Dan M. Kluger, Kerri Lu, Tijana Zrnic, Sherrie Wang, and Stephen Bates (2025). Prediction-Powered Inference with Imputed Covariates and Nonuniform Sampling. 2501.18577 [stat.ME] https://arxiv.org/abs/2501.18577