Skip to content

Releases: jlumbroso/affirmative-sampling

First release, v1.0.0

01 Jun 06:23
Compare
Choose a tag to compare

This repository contains a reference implementation, in Python, of the Affirmative Sampling algorithm by Jérémie Lumbroso and Conrado Martínez (2022), as well as the original paper, accepted at the Analysis of Algorithms 2022 edition in Philadelphia.

Affirmative Sampling is a practical and efficient novel algorithm to obtain random samples of distinct elements from a data stream.

Its most salient feature is that the size of the sample will, on expectation, grow with the (unknown) number of distinct elements in the data stream.

As any distinct element has the same probability to be sampled, and the sample size is greater when the "diversity" (the number of distinct elements) is greater, the samples that Affirmative Sampling delivers are more representative than those produced by any scheme where the sample size is fixed a priori—hence its name. This repository contains a reference implementation, in Python, to illustrate how the algorithm works and showcase some basic experimentation.