-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
feat(web): Uniform random distribution during shuffle #19902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Label error. Requires exactly 1 of: changelog:.*. Found: 🖥️web. A maintainer will add the required label. |
hmm, I intuitively set the title of the PR to "feat(web)".. now that I think about it a bit more, would you consider this more of a fix than a feat? I'm not sure, open to your opinion. |
@@ -444,8 +449,12 @@ export class TimelineManager { | |||
} | |||
|
|||
async getRandomMonthGroup() { | |||
const random = Math.floor(Math.random() * this.months.length); | |||
const month = this.months[random]; | |||
const weights: number[] = this.months.map((month) => month.assetsCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a bit faster just to pass in the total number of assets: (this.assetCount
) and then pick one at random: Math.random() * this.assetCount
-- and then, just look up the asset by index - something like:
const randomAssetIndex = Math.random() * this.assetCount;
let accumulatedOffset = 0;
for(const month of this.months) {
if (randomAssetIndex < accumulatedOffset + month.assetsCount) {
await this.loadMonthGroup(month.yearMonth, { cancelable: false })
// repeat this search-pattern using month.dayGroups
} else {
accumulatedOffset += month.assetsCount;
}
}
Its probably super minimal, but this approach would prevent having to populate the weights array up front. i.e. if you have 300K assets, and the random asset was chosen to be 42, this PR will still generate a weights array for all the months. Whereas, the above example would only not use an array of weights at all, rather, just a single variable for the accumulated offsets, and will stop as soon as it was found. (Additionally, if implemented directly in manager.getRandomAsset() - you could drop the code in manager.getRandomDayGroup and daygroup.getRandomAsset()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, thanks! I changed it to your suggested version.
About testing: I was thinking about splitting the code up into two methods on the timeline manager: The downside that I could see with that is that Do you think it would be sufficient if we just add a comment to the |
Description
In the slide show settings, the user can select if a slide show should run in forward, reverse, or shuffle order. This PR modifies the shuffle algorithm to improve the randomness, while not changing anything else.
Previously, the algorithm to determine the next photo was to: uniformly at random select a month from all the months with photos in them, then uniformly at random select a day within this month, then a random photo within that day. This has the issue that the algorithm does not result in a uniform distribution over all photos.
Assume that a user has 10 photos in their library, one in January and the other ones in February. Then the January photo will on average be shown on 50% of the slides in the slide show, even though we would only expect that to be 10%.
This PR achieves a uniform distribution by first select a month, weighted by the number of photos in that month, followed by a weighted selection of the day within that month, and then a uniform distribution over the photos within that day.
This change was discussed yesterday in the "contributing" channel on discord.
Note: I set up a quick benchmark here to compare the performance of a linear scan vs. binary search to find the matching element in the prefix sum. The runtime complexity is O(n) either way due to the assembly of the prefix sum. We could theoretically cache the prefix sum over the asset counts of the months, but since n is so small I'd argue that it's not worth it. Overall I prefer the linear scan version due to the shorter and simpler code.
How Has This Been Tested?
Testing was done manually. Without this change I always start to notice duplicates after a short while in the slide show. This has not been the case with this change added.
Note that I also added some basic tests for the sampling utility function.
Checklist:
src/services/
uses repositories implementations for database calls, filesystem operations, etc.src/repositories/
is pretty basic/simple and does not have any immich specific logic (that belongs insrc/services/
)