Scanner: cache contributor resolution during scans #1476

audiomuze · 2025-12-27T00:47:23Z

During scans we repeatedly resolve identical contributor strings; Slim::Schema::Contributor->add() can do multiple DB round-trips per call.

Add a scan-local LRU+TTL memoization cache in Schema.pm around contributor resolution in _mergeAndCreateContributors (scanner-only) to reduce redundant contributor DB work during library scans.
Cache key includes contributor + MBID + sort + extid; role is intentionally excluded.
Cache size is bounded and derived from existing dbhighmem pref; cache is cleared at scan end. Consider adding additional dropdown "Extreme" in Database Memory Config to enable cache size to be increased to 100K for large libraries.
When DEBUGLOG is enabled, emit end-of-scan stats plus a conservative heuristic estimate of avoided SQL (with an END-hook fallback for scanner runs where request events don’t fire).

Add scan-local LRU+TTL cache and DEBUGLOG stats to reduce redundant contributor DB work during library scans.

michaelherger

Please keep things simple. IMHO this is way over-engineered, with most of the code satisfying the dev's geek mode only, adding no value to the user.

michaelherger · 2025-12-27T06:55:11Z

Slim/Schema.pm

 	if ( !main::SCANNER ) {
+		require Slim::Control::Request;
 		# Wipe cached data after rescan
 		Slim::Control::Request::subscribe( sub {
 			$class->wipeCaches;
 			Slim::Schema::Album->addReleaseTypeStrings;
 		}, [['rescan'], ['done']] );
 	}
+	elsif ( main::DEBUGLOG ) {
+		require Slim::Control::Request;
+		# Initialize scan-local caches.
+		$class->_initScanContributorCache;
+		$class->_registerScanContributorCacheEndHook;
+
+		# Scanner-only stats, emitted once per scan if DEBUGLOG is enabled.
+		Slim::Control::Request::subscribe( sub {
+			$class->_logScanContributorCacheStats;
+		}, [['rescan'], ['done']] );
+	}
+	elsif ( main::SCANNER ) {
+		require Slim::Control::Request;
+		# Initialize scan-local caches even if DEBUGLOG is off.
+		$class->_initScanContributorCache;
+	}


That's not going to fly. We don't want to use Slim::Control::Request in the scanner. You don't need to subscribe to those events in the scanner, do you? The scanner would already have exited when the server signalled "scanner done".

And having a different flow for debug mode sounds wrong to me, too. I believe in the above the elsif (main::SCANNER) would hardly ever be hit, as nobody is going to disable debug logging for the scanner manually.

michaelherger · 2025-12-27T06:56:45Z

Slim/Schema.pm

+		# only store EXTID for track artist, as we don't have it for other roles
+		my $extid    = ($tag eq 'ARTIST' && $attributes->{'ARTIST_EXTID'}) ? $attributes->{'ARTIST_EXTID'} : undef;


That's an assumption which might be valid now and for some services. But this can change at any time, if it's even true now. Wouldn't Qobuz import different roles, too?

We should probably review even the existing implementation...

I have no idea what the streamer plugins do or don't provide.

I could drop the assumption and change the code to:
my $extid = $attributes->{ $tag . '_EXTID' }; ?

michaelherger · 2025-12-27T07:00:45Z

Slim/Schema.pm

+my %_scanContributorAddCache;
+my $_scanContributorAddCacheTied = 0;
+my $_scanContributorAddCacheMaxEntries;
+my $_scanContributorCacheEndHooked = 0;
+my $_scanContributorCacheApproxSize = 0;
+
+sub _scanContributorCacheMaxEntries {
+	# Use the existing cross-platform `dbhighmem` preference as a proxy
+	# Keep this deliberately bounded: on low-RAM systems, very large hash-based
+	# caches can become noticeable during big scans.
+	return $prefs->get('dbhighmem') ? 50000 : 15000;
+}
+
+sub _initScanContributorCache {
+	my $class = shift;
+	return unless main::SCANNER;
+	return if $_scanContributorAddCacheTied;
+
+	my $entries = $class->_scanContributorCacheMaxEntries;
+	$_scanContributorAddCacheMaxEntries = $entries;
+	$_scanContributorCacheApproxSize = 0;
+	tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires', EXPIRES => 3600, ENTRIES => $entries;
+	$_scanContributorAddCacheTied = 1;
+}


Why this complicated setup? Why not just initialise the cache like we do for the content type? A line or two instead of dozens of them and repeated check for whether it's been initialised or not?

I can simplify, but before I do, is sizing the cache according to the user's selected preference in Database Memory Config something you think should be lost or retained? To my mind limiting the cache size is necessary given Lyrion is expected to run on SOC with < 1GB RAM. On the other hand limiting the cache size to satisfy only RAM deficient (mostly legacy) SOCs would significantly limit the benefit of including the cache in the first place.

I had toyed with being more direct around cache size based on installed RAM but that would mean adding additional modules, which I figured you'd be against.

Totally reasonable. But this doesn't need much code. See eg. https://github.com/LMS-Community/slimserver/blob/public/9.1/Slim/Control/Queries.pm#L69

This work for you?

# Scan-local contributor cache. tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires', EXPIRES => 3600, ENTRIES => ($prefs->get('dbhighmem') ? 50000 : 15000) if main::SCANNER;

michaelherger · 2025-12-27T07:03:46Z

Slim/Schema.pm

+# Optional instrumentation (scanner + DEBUGLOG only)
+# These counters are used to quantify effectiveness and provide a conservative
+# estimate of avoided DB statements.
+my (


Too much noise for my taste. Who'd be interested in having this other than you as the developer? That's only adding overhead and code for little to no use for the user. Please remove this. You can still have those lines in your clone. But why should we have that much code to satisfy the geek in me for this one particular cache, when we don't do any of this anywhere else?

I specifically included it as a temporary aid to help you assess whether or not the cache has sufficient impact to warrant the PR.

Scanner: cache contributor resolution during scans

0d6dea2

Add scan-local LRU+TTL cache and DEBUGLOG stats to reduce redundant contributor DB work during library scans.

michaelherger requested changes Dec 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scanner: cache contributor resolution during scans #1476

Scanner: cache contributor resolution during scans #1476

Uh oh!

audiomuze commented Dec 27, 2025

Uh oh!

michaelherger left a comment

Uh oh!

michaelherger Dec 27, 2025

Uh oh!

audiomuze Dec 27, 2025

Uh oh!

michaelherger Dec 27, 2025

Uh oh!

audiomuze Dec 27, 2025

Uh oh!

michaelherger Dec 27, 2025

Uh oh!

audiomuze Dec 27, 2025

Uh oh!

michaelherger Dec 27, 2025

Uh oh!

audiomuze Dec 27, 2025

Uh oh!

michaelherger Dec 27, 2025

Uh oh!

audiomuze Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# only store EXTID for track artist, as we don't have it for other roles
		my $extid = ($tag eq 'ARTIST' && $attributes->{'ARTIST_EXTID'}) ? $attributes->{'ARTIST_EXTID'} : undef;

Scanner: cache contributor resolution during scans #1476

Are you sure you want to change the base?

Scanner: cache contributor resolution during scans #1476

Uh oh!

Conversation

audiomuze commented Dec 27, 2025

Uh oh!

michaelherger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants