Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we understand this function ‘FSE_normalizeM2’? #108

Open
huhu0823 opened this issue Sep 1, 2022 · 2 comments
Open

How should we understand this function ‘FSE_normalizeM2’? #108

huhu0823 opened this issue Sep 1, 2022 · 2 comments

Comments

@huhu0823
Copy link

huhu0823 commented Sep 1, 2022

I am new here and still reading the source code.I can't understand these calculations

https://github.com/Cyan4973/FiniteStateEntropy/blob/dev/lib/fse_compress.c
Line 415~429 in fse_compress.c

@Cyan4973
Copy link
Owner

This is a code I haven't looked at in a while, so I might forget things.

If I remember correctly, this function FSE_normalizeM2() is a backup strategy to distribute statistics across an alphabet.
The normalization requires that the sum of all stats be equal to a clean power of 2 total.

Sometimes, this is relatively straightforward to do, but if the code ever reaches this function, that means the initial distribution of statistics failed and was actually difficult to normalize.

It could be for a nb of reasons, a common one being that there are many symbols with a very low yet non-zero probability. Since each symbol needs to receive a frequency of at least 1, they collectively occupy more probability estate than they should. This creates an imbalance, and all other symbols need to make up for this.

Now this can be done in multiple ways, with varying degrees of complexity and efficiency.
The code you refer to seems to be the last possible strategy, after all previous ones have been ruled out. I think it evenly spreads statistics across remaining symbols, scaling them to the remaining probability space required to reach the total.

The particular method used here is not important to understand FSE. Any other entropy codec, for example arithmetic coder, will face the same challenge : they need to "normalize" statistics across a clean power of 2 scale.
So this could be achieved in any different way. It doesn't change the rest of the encoding process.

@huhu0823
Copy link
Author

This is a code I haven't looked at in a while, so I might forget things.

If I remember correctly, this function FSE_normalizeM2() is a backup strategy to distribute statistics across an alphabet. The normalization requires that the sum of all stats be equal to a clean power of 2 total.

Sometimes, this is relatively straightforward to do, but if the code ever reaches this function, that means the initial distribution of statistics failed and was actually difficult to normalize.

It could be for a nb of reasons, a common one being that there are many symbols with a very low yet non-zero probability. Since each symbol needs to receive a frequency of at least 1, they collectively occupy more probability estate than they should. This creates an imbalance, and all other symbols need to make up for this.

Now this can be done in multiple ways, with varying degrees of complexity and efficiency. The code you refer to seems to be the last possible strategy, after all previous ones have been ruled out. I think it evenly spreads statistics across remaining symbols, scaling them to the remaining probability space required to reach the total.

The particular method used here is not important to understand FSE. Any other entropy codec, for example arithmetic coder, will face the same challenge : they need to "normalize" statistics across a clean power of 2 scale. So this could be achieved in any different way. It doesn't change the rest of the encoding process.

Thank you for your reply!Although I still don't quite understand this, I think it's really not important.Maybe in the future, I can understand it.Thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants