-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statistic correction option #5061
base: master
Are you sure you want to change the base?
Conversation
f60b9bb
to
922a9af
Compare
How does this deal with different event types? Is that already in the stat parsing machinery, and if so can you remind me where? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See suggested comment changes & the question about event types.
Different event types are handled in the config file. SO one specifies something like:
(old naming for this argument, but same idea). |
Right, that is what I suspected & seems the easiest way of doing this (modulo the case where we would want a different number for the same event type in different coinc times, but that seems not to be needed quite yet). |
Co-authored-by: Thomas Dent <[email protected]>
Co-authored-by: Thomas Dent <[email protected]>
Co-authored-by: Thomas Dent <[email protected]>
Co-authored-by: Thomas Dent <[email protected]>
So to me the only remaining concern is around |
I am happy with the redundancy - all the benchmark_lograte means is that the numbers are a bit more natural this way |
Co-authored-by: Thomas Dent <[email protected]>
If both the benchmark lograte and this term are used in exactly the same ranking statistics, we ought to be able to get rid of one of them by defining a constant, eg This leaves the output identical but takes out one redundant option which could confuse any users who tried to understand all the optional kwargs. |
I guess alternatively, in the spirit of keeping the production branch changes as simple as possible, we could cherry pick this minimal change to the 2.3 branch and then clean up the benchmark redundancy on master with a separate PR. |
This patch adds a
statistic_correction
"statistic keyword" instat.py
. This option will add a constant factor to all returned statistic values.The immediate problem this patch sets out to fix is this plot (from https://inspirehep.net/literature/2053424)
This shows the desired behaviour using the state of the art statistic as of a few years ago. In particular HL doubles lie above other things.
When adding the new KDE term we see this relative behaviour change and H/L singles lie above the doubles line, causing singles to be weighted more than HL doubles when combining. Using this option we can downrank singles relative to doubles (and add appropriate factors when considering 3-ifo behaviour).
However, the patch is broader than that, it could be used (for e.g.) to try to make
0
correspond to a 50/50 chance of being real ... Or any other thing you could dream up where we just want to scale the statistic.I've tested that this produces the desired effect.