-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
choose_multiple_weighted
returns unexpect probability of result
#1476
Comments
I can reproduce this. More specifically,
|
The problem is in Note that our More to the point, this algorithm does not appear to be suitable for the small weights used as inputs here. Pre-scaling of weights into some more suitable range may be a solution, however a key point of the algorithm used is that it requires only a single iteration over a list of weights (which are in this case generated via call to the weight function for each value). Without knowing the largest input weight in advance we would have to iterate twice over the weights (storing or regenerating). At this point a different API may be better; or we could allow the user to input the maximum weight (or a weight scaling function) as a hint. For this particular example, a partial fix is to scale by the inverse of the largest weight in the weight fn:
@TheIronBorn have you got any suggestions? It seems that the algorithm is just a poor choice for small weights. |
We could keep this algorithm but document its limitations while documenting its limitations. In this case, the best (or at least easiest) alternative may be to use Choose-multiple-weighted is not a problem I've put much thought into; there are probably other algorithms. |
…ust-random#1476 Also improves choose_two_weighted_indexed time by 23% (excluding new test)
For the above problem, I would expect the following average results of 100'000 samples:
Above, I got:
Re-running over master now, I get:
With #1530, I get:
So seems pretty good. (Noting also that Codeuse rand::{prelude::IndexedRandom, rng};
#[test]
fn main() {
let values: Vec<(i32, f64)> =
vec![(1, 0.080), (2, 0.0078), (3, 0.012)]
.into_iter()
.map(|v| (v.0, f64::powf(v.1, 4.0)))
.collect();
for v in &values {
println!("value={} has weight={:.10}", v.0, v.1);
}
let weight_sum: f64 = values.iter().map(|(_, w)| w).sum();
let weights: Vec<f64> = values.iter().map(|(_, w)| w / weight_sum).collect();
println!("expected: {weights:?}");
let largest_weight = values[0].1;
let mut results = [0u32; 9];
for _ in 0..100_000 {
let mut iter = values.choose_multiple_weighted(&mut rng(), 2, |a| a.1 / largest_weight).unwrap();
let a = iter.next().unwrap().0;
let b = iter.next().unwrap().0;
assert!(iter.next().is_none());
let i = (a - 1) * 3 + b - 1;
results[i as usize] += 1;
}
println!("got following results:");
for i in 0..9 {
let i0 = i / 3 + 1;
let i1 = i % 3 + 1;
println!("({}, {}):\t{}", i0, i1, results[i]);
}
assert!(false);
} |
Summary
I am trying to choose N values from vector with weights. All weights calculated as score.powf(4.)
What i expected
I expect values will be chosen so many times, in relation to their weights. Here is the weights after
powf
function:As you can see, the lowest weight has value
2
, and i expect this value will be chosen least frequent.What i got
As you can see, value
2
had been choosen100k
times, more than value3
, that is only chosen once. Here is the score of values 2 and 3:Note
I noticed that this happens with very small floats. Also, i guess that the initial order is matters. Both of this not mentioned in documentation, so this is either docs issue, or bug. Also i do not exclude that i may missed something, so maybe i am wrong.
Code sample
The text was updated successfully, but these errors were encountered: