Asymmetric Numeral Systems (ANS) - Technical Overview

Introduction and Basic Concept

The core idea of ANS is to encode a sequence of symbols into a single integer state, with each symbol affecting the state in a way that reflects its probability of occurrence. This encoding is reversible, allowing us to recover the original sequence from the coded integer.

Suppose that ${0, 1, ..., S-1}$ are the symbols we want to encode, and assume that they occur with relative frequencies $f_0, ..., f_{S-1}$ (positive integers). The goal is to encode a sequence of symbols $(s_1, s_2, ..., s_k)$ into a single integer state $x$.

We make the following assumptions

The symbols are sampled randomly and independently of one another
For efficiency in the calculations, the sum of the frequencies is a power of two: $$L = 2^l = \sum_{i=0}^{S-1} f_i$$

Basic ANS Encoding/Decoding

The Symbol Table

To encode our symbols efficiently, we need a way to map between states and symbols that reflects their frequencies. We accomplish this using an infinite table $T$ that maps natural numbers to symbols: $T: \mathbb{N} \to {0,...,S-1}$

We construct this table with a periodic pattern where:

Each period of length $L$ consists of consecutive blocks:
- $f_0$ occurrences of symbol 0
- $f_1$ occurrences of symbol 1
- ...
- $f_{S-1}$ occurrences of symbol $S-1$
This pattern repeats every $L$ positions
For any position $n$, $T(n) = T(n \text{ mod } L)$
Define $C_s = \sum_{i=0}^{s-1} f_i$ as the cumulative frequency (start position of symbol $s$ in each period)

The Encoding/Decoding Process

The encoding process works by maintaining a state value $x$ that evolves as we process each symbol. Starting from state $x_0$:

To encode symbol $s_1$:
- Find the $x_0$-th occurrence of symbol $s_1$ (counting from 0)
- This position index becomes our new state $x_1$
Repeat this process for each new symbol, using the previous state to find the next one

This process effectively "pushes" each symbol onto our state value in a way that can be reversed.

With the periodic structured table, the formula for this is

$$x_{i+1} = (x_i // f_{s_i}) \cdot L + C(s_i) + (x_i \text{ mod } f_{s_i})$$

where $//$ denotes integer division and $\text{ mod }$ is the modulo operation.

Notice that for large $x_i$,

$$x_{i+1} \approx x_i \cdot p_s^{-1}$$

where $p_s = f_s / L$ is the probability of symbol $s$. This means that the expected value of $\log(x_i)$ grows by $-\log(p_s)$ at each step, leading to:

$$\lim_{k \to \infty} E(\frac{1}{k}\log(x_k)) = -\sum_{s=0}^{S-1} p_s \log(p_s).$$

Or in other words, if $H$ is the average number of bits required to encode a single symbol in the sequence, then

$$H = -\sum_{s=0}^{S-1} p_s \log(p_s)$$

which matches the Shannon entropy of the symbol distribution. In other words, the integer $x_k$ encodes the state using the theoretically optimal number of bits.

Now for decoding. Given a final state $x_k$, we can recover the original sequence as follows:

The last symbol $s_k$ is simply $T[x_k]$ (the symbol at position $x_k$)
To get the previous state $x_{k-1}$:
- Count how many times $s_k$ appeared before position $x_k$
- This count is our previous state $x_{k-1}$
Continue this process to recover all symbols in reverse order

The formula for this decoding is

$$x_i = (x_{i+1} // L) \cdot f_{s_i} + (x_{i+1} \text{ mod } L) - C(s_i)$$

Streaming ANS

The Need for Streaming

In practice, we can't work with arbitrarily large integers. Our state $x$ would grow indefinitely as we encode more symbols, and operating on integers with arbitrarily large precision is very inefficient. We therefore need a way to keep the state within a manageable range while preserving the reversibility of the encoding.

The Solution: State Normalization

We'll maintain our state $x$ within the interval $[L, 2L)$ by streaming out some bits to a separate array at each iteration. We need this process to be reversible.

Let B be an array of bits (bit stream), initialized to the empty array. We'll encode symbol $s$ with state $x$ as follows:

Encoding $s, (x,B) \to (x',B')$

When encoding symbol $s$ with state $(x, B)$:

Find normalization factor $d$: the unique integer where $x // 2^d \in [f_s, 2f_s)$
Stream out the lower $d$ bits: $V = x \text{ mod } 2^d$ to bitstream $B$ to form $B'$
Use the normalized value to compute next state: $$x' = L + C_s + ((x // 2^d) \text{ mod } f_s)$$

Decoding $(x',B') \to s, (x,B)$

To reverse the process:

Read the current symbol: $s = T[x']$
Compute the normalized state: $x_2 = f_s + (x' \text{ mod } L) - C[s]$
Find $d$: the unique integer where $x_2 \cdot 2^d \in [L, 2L)$
Read $d$ bits ($V$) from the end of bitstream $B'$ (and remove them to form $B$)
Reconstruct the original state: $x = (x_2 \cdot 2^d) + V$

With this process, the state stays bounded, and we accumulate a bit stream that represents the compressed version of the original sequence.

Optimality of the Compression

In the above, there are two sources of loss in the compression efficiency:

The assumption that the frequencies sum to a power of two
The need to normalize the state

The first of these can be addressed by choosing $L=2^l$ large enough so that the proportions $p_i = f_i / L$ are accurate enough so that the loss is negligible.

The second source of loss is more difficult to predict but can be examined empirically. (do this analysis)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

technical_overview.md

technical_overview.md

Asymmetric Numeral Systems (ANS) - Technical Overview

Introduction and Basic Concept

Basic ANS Encoding/Decoding

The Symbol Table

The Encoding/Decoding Process

Streaming ANS

The Need for Streaming

The Solution: State Normalization

Encoding $s, (x,B) \to (x',B')$

Decoding $(x',B') \to s, (x,B)$

Optimality of the Compression

Files

technical_overview.md

Latest commit

History

technical_overview.md

File metadata and controls

Asymmetric Numeral Systems (ANS) - Technical Overview

Introduction and Basic Concept

Basic ANS Encoding/Decoding

The Symbol Table

The Encoding/Decoding Process

Streaming ANS

The Need for Streaming

The Solution: State Normalization

Encoding $s, (x,B) \to (x',B')$

Decoding $(x',B') \to s, (x,B)$

Optimality of the Compression