Skip to content

feat(web): Uniform random distribution during shuffle #19902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Pascal-So
Copy link
Contributor

Description

In the slide show settings, the user can select if a slide show should run in forward, reverse, or shuffle order. This PR modifies the shuffle algorithm to improve the randomness, while not changing anything else.

Previously, the algorithm to determine the next photo was to: uniformly at random select a month from all the months with photos in them, then uniformly at random select a day within this month, then a random photo within that day. This has the issue that the algorithm does not result in a uniform distribution over all photos.

Assume that a user has 10 photos in their library, one in January and the other ones in February. Then the January photo will on average be shown on 50% of the slides in the slide show, even though we would only expect that to be 10%.

This PR achieves a uniform distribution by first select a month, weighted by the number of photos in that month, followed by a weighted selection of the day within that month, and then a uniform distribution over the photos within that day.

This change was discussed yesterday in the "contributing" channel on discord.

Note: I set up a quick benchmark here to compare the performance of a linear scan vs. binary search to find the matching element in the prefix sum. The runtime complexity is O(n) either way due to the assembly of the prefix sum. We could theoretically cache the prefix sum over the asset counts of the months, but since n is so small I'd argue that it's not worth it. Overall I prefer the linear scan version due to the shorter and simpler code.

How Has This Been Tested?

Testing was done manually. Without this change I always start to notice duplicates after a short while in the slide show. This has not been the case with this change added.

Note that I also added some basic tests for the sampling utility function.

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Copy link
Contributor

github-actions bot commented Jul 12, 2025

Label error. Requires exactly 1 of: changelog:.*. Found: 🖥️web. A maintainer will add the required label.

@Pascal-So
Copy link
Contributor Author

hmm, I intuitively set the title of the PR to "feat(web)".. now that I think about it a bit more, would you consider this more of a fix than a feat? I'm not sure, open to your opinion.

@@ -444,8 +449,12 @@ export class TimelineManager {
}

async getRandomMonthGroup() {
const random = Math.floor(Math.random() * this.months.length);
const month = this.months[random];
const weights: number[] = this.months.map((month) => month.assetsCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a bit faster just to pass in the total number of assets: (this.assetCount) and then pick one at random: Math.random() * this.assetCount -- and then, just look up the asset by index - something like:

const randomAssetIndex = Math.random() * this.assetCount;
let accumulatedOffset = 0;
for(const month of this.months) {
   if (randomAssetIndex < accumulatedOffset + month.assetsCount) {
       await this.loadMonthGroup(month.yearMonth, { cancelable: false })
      // repeat this search-pattern using month.dayGroups
	} else {
      accumulatedOffset += month.assetsCount;
  }
}

Its probably super minimal, but this approach would prevent having to populate the weights array up front. i.e. if you have 300K assets, and the random asset was chosen to be 42, this PR will still generate a weights array for all the months. Whereas, the above example would only not use an array of weights at all, rather, just a single variable for the accumulated offsets, and will stop as soon as it was found. (Additionally, if implemented directly in manager.getRandomAsset() - you could drop the code in manager.getRandomDayGroup and daygroup.getRandomAsset()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, thanks! I changed it to your suggested version.

@Pascal-So
Copy link
Contributor Author

About testing: I was thinking about splitting the code up into two methods on the timeline manager: getNthAsset which we could test, and then make getRandomAsset just a thin untested wrapper.

The downside that I could see with that is that getNthAsset might not have a clear semantic meaning, since from looking at DayGroup.sortAssets I saw that there is the concept of AssetOrder which opens up the question if such a getNthAsset method should change behaviour depending on the order. For the shuffle we don't actually care about the order so in a test I'd just ensure that every asset gets visited if we call getNthAsset with every index so that test would not break either way.

Do you think it would be sufficient if we just add a comment to the getNthAsset method warning any possible future users that the behaviour is dependent on the current AssetOrder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants