Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statistic correction option #5061

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

spxiwh
Copy link
Contributor

@spxiwh spxiwh commented Feb 25, 2025

This patch adds a statistic_correction "statistic keyword" in stat.py. This option will add a constant factor to all returned statistic values.

The immediate problem this patch sets out to fix is this plot (from https://inspirehep.net/literature/2053424)

image

This shows the desired behaviour using the state of the art statistic as of a few years ago. In particular HL doubles lie above other things.

When adding the new KDE term we see this relative behaviour change and H/L singles lie above the doubles line, causing singles to be weighted more than HL doubles when combining. Using this option we can downrank singles relative to doubles (and add appropriate factors when considering 3-ifo behaviour).

However, the patch is broader than that, it could be used (for e.g.) to try to make 0 correspond to a 50/50 chance of being real ... Or any other thing you could dream up where we just want to scale the statistic.

I've tested that this produces the desired effect.

@spxiwh spxiwh requested a review from tdent February 25, 2025 09:25
@spxiwh spxiwh force-pushed the pr_stat_correction branch from f60b9bb to 922a9af Compare February 25, 2025 10:37
@tdent
Copy link
Contributor

tdent commented Feb 25, 2025

How does this deal with different event types? Is that already in the stat parsing machinery, and if so can you remind me where?

Copy link
Contributor

@tdent tdent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggested comment changes & the question about event types.

@spxiwh
Copy link
Contributor Author

spxiwh commented Feb 25, 2025

Different event types are handled in the config file. SO one specifies something like:

[sngls-h1&sngls-l1]
statistic-keywords = correction_factor:-3.7
minimum-stat = -4

[sngls-v1]
statistic-keywords = correction_factor:-7.4
minimum-stat = -7

[coinc-h1l1]
statistic-keywords = correction_factor:-1

[coinc-l1v1]
statistic-keywords = correction_factor:-3.7

[coinc-h1v1]
statistic-keywords = correction_factor:-3.7


[coinc-h1l1v1]
statistic-keywords = correction_factor:0

(old naming for this argument, but same idea).

@tdent
Copy link
Contributor

tdent commented Feb 25, 2025

Right, that is what I suspected & seems the easiest way of doing this (modulo the case where we would want a different number for the same event type in different coinc times, but that seems not to be needed quite yet).

spxiwh and others added 4 commits February 25, 2025 12:45
@spxiwh
Copy link
Contributor Author

spxiwh commented Feb 25, 2025

So to me the only remaining concern is around benchmark_lograte and if this adds redundancy. Is there anything I should do about this, or are folks happy?

@GarethCabournDavies
Copy link
Contributor

So to me the only remaining concern is around benchmark_lograte and if this adds redundancy. Is there anything I should do about this, or are folks happy?

I am happy with the redundancy - all the benchmark_lograte means is that the numbers are a bit more natural this way

Co-authored-by: Thomas Dent <[email protected]>
@tdent
Copy link
Contributor

tdent commented Feb 25, 2025

If both the benchmark lograte and this term are used in exactly the same ranking statistics, we ought to be able to get rid of one of them by defining a constant, eg BENCHMARK_LOGRATE = -14.6 at the module level to replace the class attribute in ln_noise_rate -= self.benchmark_lograte.

This leaves the output identical but takes out one redundant option which could confuse any users who tried to understand all the optional kwargs.
How do people feel about adding that 'trivial' change here?

@tdent tdent added offline search v23_release_branch PRs applied to the v2.3.X release branch or to be cherry-picked if merging to master labels Feb 25, 2025
@tdent tdent self-requested a review February 25, 2025 21:24
@tdent
Copy link
Contributor

tdent commented Feb 25, 2025

I guess alternatively, in the spirit of keeping the production branch changes as simple as possible, we could cherry pick this minimal change to the 2.3 branch and then clean up the benchmark redundancy on master with a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
offline search v23_release_branch PRs applied to the v2.3.X release branch or to be cherry-picked if merging to master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants