The core idea of ANS is to encode a sequence of symbols into a single integer state, with each symbol affecting the state in a way that reflects its probability of occurrence. This encoding is reversible, allowing us to recover the original sequence from the coded integer.
Suppose that
We make the following assumptions
- The symbols are sampled randomly and independently of one another
- For efficiency in the calculations, the sum of the frequencies is a power of two:
$$L = 2^l = \sum_{i=0}^{S-1} f_i$$
To encode our symbols efficiently, we need a way to map between states and symbols that reflects their frequencies. We accomplish this using an infinite table
We construct this table with a periodic pattern where:
- Each period of length
$L$ consists of consecutive blocks:-
$f_0$ occurrences of symbol 0 -
$f_1$ occurrences of symbol 1 - ...
-
$f_{S-1}$ occurrences of symbol$S-1$
-
- This pattern repeats every
$L$ positions - For any position
$n$ ,$T(n) = T(n \text{ mod } L)$ - Define
$C_s = \sum_{i=0}^{s-1} f_i$ as the cumulative frequency (start position of symbol$s$ in each period)
The encoding process works by maintaining a state value
- To encode symbol
$s_1$ :- Find the
$x_0$ -th occurrence of symbol$s_1$ (counting from 0) - This position index becomes our new state
$x_1$
- Find the
- Repeat this process for each new symbol, using the previous state to find the next one
This process effectively "pushes" each symbol onto our state value in a way that can be reversed.
With the periodic structured table, the formula for this is
where
Notice that for large
where
Or in other words, if
which matches the Shannon entropy of the symbol distribution. In other words, the integer
Now for decoding. Given a final state
- The last symbol
$s_k$ is simply$T[x_k]$ (the symbol at position$x_k$ ) - To get the previous state
$x_{k-1}$ :- Count how many times
$s_k$ appeared before position$x_k$ - This count is our previous state
$x_{k-1}$
- Count how many times
- Continue this process to recover all symbols in reverse order
The formula for this decoding is
In practice, we can't work with arbitrarily large integers. Our state
We'll maintain our state
Let B be an array of bits (bit stream), initialized to the empty array. We'll encode symbol
When encoding symbol
- Find normalization factor
$d$ : the unique integer where$x // 2^d \in [f_s, 2f_s)$ - Stream out the lower
$d$ bits:$V = x \text{ mod } 2^d$ to bitstream$B$ to form$B'$ - Use the normalized value to compute next state:
$$x' = L + C_s + ((x // 2^d) \text{ mod } f_s)$$
To reverse the process:
- Read the current symbol:
$s = T[x']$ - Compute the normalized state:
$x_2 = f_s + (x' \text{ mod } L) - C[s]$ - Find
$d$ : the unique integer where$x_2 \cdot 2^d \in [L, 2L)$ - Read
$d$ bits ($V$ ) from the end of bitstream$B'$ (and remove them to form$B$ ) - Reconstruct the original state:
$x = (x_2 \cdot 2^d) + V$
With this process, the state stays bounded, and we accumulate a bit stream that represents the compressed version of the original sequence.
In the above, there are two sources of loss in the compression efficiency:
- The assumption that the frequencies sum to a power of two
- The need to normalize the state
The first of these can be addressed by choosing
The second source of loss is more difficult to predict but can be examined empirically. (do this analysis)