Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 217 additions & 7 deletions Slim/Schema.pm
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,143 @@ our $lastAlbum = {};
# Optimization to cache content type for track entries rather than look them up everytime.
tie our %contentTypeCache, 'Tie::Cache::LRU::Expires', EXPIRES => 300, ENTRIES => 128;

# Scanner optimization (scan-local): during scanning we repeatedly resolve the same
# contributor strings to contributor IDs. Slim::Schema::Contributor->add() does
# multiple DB round-trips per contributor (SELECT + optional UPDATE/INSERT).
#
# To reduce redundant churn during large/metadata-rich scans, retain the resolved
# contributor ID list for *identical* inputs. This is deliberately scanner-only
# to avoid any behavioural changes during normal server operation.
#
# Safety/behaviour assumptions:
# - We only short-circuit when the inputs that influence add() are identical.
# The cache key includes contributor string + MBID + sort + extid.
# - Role/tag is NOT part of the key because add()'s behaviour depends only on the
# contributor data, not the role in which it was referenced.
# - Cache is cleared at scan end (rescan done).
my %_scanContributorAddCache;
my $_scanContributorAddCacheTied = 0;
my $_scanContributorAddCacheMaxEntries;
my $_scanContributorCacheEndHooked = 0;
my $_scanContributorCacheApproxSize = 0;

sub _scanContributorCacheMaxEntries {
# Use the existing cross-platform `dbhighmem` preference as a proxy
# Keep this deliberately bounded: on low-RAM systems, very large hash-based
# caches can become noticeable during big scans.
return $prefs->get('dbhighmem') ? 50000 : 15000;
}

sub _initScanContributorCache {
my $class = shift;
return unless main::SCANNER;
return if $_scanContributorAddCacheTied;

my $entries = $class->_scanContributorCacheMaxEntries;
$_scanContributorAddCacheMaxEntries = $entries;
$_scanContributorCacheApproxSize = 0;
tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires', EXPIRES => 3600, ENTRIES => $entries;
$_scanContributorAddCacheTied = 1;
}
Comment on lines +87 to +110
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this complicated setup? Why not just initialise the cache like we do for the content type? A line or two instead of dozens of them and repeated check for whether it's been initialised or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can simplify, but before I do, is sizing the cache according to the user's selected preference in Database Memory Config something you think should be lost or retained? To my mind limiting the cache size is necessary given Lyrion is expected to run on SOC with < 1GB RAM. On the other hand limiting the cache size to satisfy only RAM deficient (mostly legacy) SOCs would significantly limit the benefit of including the cache in the first place.

I had toyed with being more direct around cache size based on installed RAM but that would mean adding additional modules, which I figured you'd be against.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally reasonable. But this doesn't need much code. See eg. https://github.com/LMS-Community/slimserver/blob/public/9.1/Slim/Control/Queries.pm#L69

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work for you?

# Scan-local contributor cache.
tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires',
	EXPIRES => 3600,
	ENTRIES => ($prefs->get('dbhighmem') ? 50000 : 15000)
if main::SCANNER;


sub _registerScanContributorCacheEndHook {
my $class = shift;
return unless main::SCANNER;
return unless main::DEBUGLOG;
return if $_scanContributorCacheEndHooked;

# Some scanner runs do not have a functional request/subscribe lifecycle.
# Register an END hook as a fallback so benchmarking runs always print a
# single final stats line.
eval 'END { Slim::Schema->_logScanContributorCacheStats(); }';
$_scanContributorCacheEndHooked = 1;
}

# Optional instrumentation (scanner + DEBUGLOG only)
# These counters are used to quantify effectiveness and provide a conservative
# estimate of avoided DB statements.
my (
Comment on lines +125 to +128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much noise for my taste. Who'd be interested in having this other than you as the developer? That's only adding overhead and code for little to no use for the user. Please remove this. You can still have those lines in your clone. But why should we have that much code to satisfy the geek in me for this one particular cache, when we don't do any of this anywhere else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically included it as a temporary aid to help you assess whether or not the cache has sufficient impact to warrant the PR.

$_scanContributorCacheHitKeys,
$_scanContributorCacheMissKeys,
$_scanContributorCacheHitIds,
$_scanContributorCacheMissIds,
$_scanContributorCacheHitIdsWithMbid,
$_scanContributorCacheStoreKeys,
$_scanContributorCacheStoreIds,
) = (0, 0, 0, 0, 0, 0, 0);

sub _logScanContributorCacheStats {
return unless main::SCANNER;
return unless main::DEBUGLOG;

my $scanlog = logger('scan.import');
return unless (($scanlog && $scanlog->is_info) || $log->is_debug);

my $lookups = $_scanContributorCacheHitKeys + $_scanContributorCacheMissKeys;
# Avoid duplicate log lines if multiple end-of-scan hooks fire in one process.
# The first emission resets all counters, so subsequent calls will see 0.
return unless $lookups;
my $hitRate = $lookups ? ($_scanContributorCacheHitKeys / $lookups) : 0;
my $cacheApproxSize = $_scanContributorCacheApproxSize;
my $cacheMaxEntries = $_scanContributorAddCacheMaxEntries || 0;

my $statsLine = sprintf(
"Scan contributor cache: lookups=%d hits=%d misses=%d hitRate=%.1f%% avoidedContributorAdds=%d newContributorAdds=%d uniqueCachedKeys=%d cachedIds=%d cacheKeysNow~%d (approx) cap=%d",
$lookups,
$_scanContributorCacheHitKeys,
$_scanContributorCacheMissKeys,
$hitRate * 100,
$_scanContributorCacheHitIds,
$_scanContributorCacheMissIds,
$_scanContributorCacheStoreKeys,
$_scanContributorCacheStoreIds,
$cacheApproxSize,
$cacheMaxEntries,
);
$scanlog && $scanlog->is_info ? $scanlog->info($statsLine) : $log->debug($statsLine);

# Estimate avoided DB statements on cache hits.
# This is intentionally a heuristic range (MIN..MAX), because add() may
# execute different SQL depending on whether the row exists, and whether
# namesort/MBID/extid updates are needed.
#
# For each contributor entry inside Contributor->add():
# - minimum: 1 SELECT
# - maximum (no MBID): 1 SELECT + up to 3 UPDATE = 4 statements
# - maximum (with MBID): up to 2 SELECT + up to 3 UPDATE = 5 statements
my $hitIdsWithMbid = $_scanContributorCacheHitIdsWithMbid;
my $hitIdsNoMbid = $_scanContributorCacheHitIds - $hitIdsWithMbid;
$hitIdsNoMbid = 0 if $hitIdsNoMbid < 0;

my $minSelectsAvoided = $_scanContributorCacheHitIds;
my $maxSelectsAvoided = $hitIdsNoMbid + (2 * $hitIdsWithMbid);

my $minStatementsAvoided = $_scanContributorCacheHitIds;
my $maxStatementsAvoided = (4 * $hitIdsNoMbid) + (5 * $hitIdsWithMbid);

my $estimateLine = sprintf(
"Scan contributor cache estimate: avoidedStatements~%d..%d avoidedSELECTs~%d..%d (heuristic)",
$minStatementsAvoided,
$maxStatementsAvoided,
$minSelectsAvoided,
$maxSelectsAvoided,
);
$scanlog && $scanlog->is_info ? $scanlog->info($estimateLine) : $log->debug($estimateLine);

# Reset so multiple queued scans in the same process don't accumulate.
(
$_scanContributorCacheHitKeys,
$_scanContributorCacheMissKeys,
$_scanContributorCacheHitIds,
$_scanContributorCacheMissIds,
$_scanContributorCacheHitIdsWithMbid,
$_scanContributorCacheStoreKeys,
$_scanContributorCacheStoreIds,
$_scanContributorCacheApproxSize,
) = (0, 0, 0, 0, 0, 0, 0, 0);
%_scanContributorAddCache = ();
}

# For the VA album merging & scheduler globals.
my ($variousAlbumIds, $vaObj, $vaObjId);

Expand Down Expand Up @@ -241,12 +378,29 @@ sub init {
}

if ( !main::SCANNER ) {
require Slim::Control::Request;
# Wipe cached data after rescan
Slim::Control::Request::subscribe( sub {
$class->wipeCaches;
Slim::Schema::Album->addReleaseTypeStrings;
}, [['rescan'], ['done']] );
}
elsif ( main::DEBUGLOG ) {
require Slim::Control::Request;
# Initialize scan-local caches.
$class->_initScanContributorCache;
$class->_registerScanContributorCacheEndHook;

# Scanner-only stats, emitted once per scan if DEBUGLOG is enabled.
Slim::Control::Request::subscribe( sub {
$class->_logScanContributorCacheStats;
}, [['rescan'], ['done']] );
}
elsif ( main::SCANNER ) {
require Slim::Control::Request;
# Initialize scan-local caches even if DEBUGLOG is off.
$class->_initScanContributorCache;
}
Comment on lines 380 to +403
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not going to fly. We don't want to use Slim::Control::Request in the scanner. You don't need to subscribe to those events in the scanner, do you? The scanner would already have exited when the server signalled "scanner done".

And having a different flow for debug mode sounds wrong to me, too. I believe in the above the elsif (main::SCANNER) would hardly ever be hit, as nobody is going to disable debug logging for the scanner manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


$initialized = 1;

Expand Down Expand Up @@ -3134,13 +3288,69 @@ sub _mergeAndCreateContributors {
# Is ARTISTSORT/TSOP always right for non-artist
# contributors? I think so. ID3 doesn't have
# "BANDSORT" or similar at any rate.
push @{ $contributors{$tag} }, Slim::Schema::Contributor->add({
'artist' => $contributor,
'brainzID' => $attributes->{"MUSICBRAINZ_${tag}_ID"},
'sortBy' => $attributes->{$tag.'SORT'},
# only store EXTID for track artist, as we don't have it for other roles
'extid' => $tag eq 'ARTIST' && $attributes->{'ARTIST_EXTID'},
});
my $brainzID = $attributes->{"MUSICBRAINZ_${tag}_ID"};
my $sortBy = $attributes->{$tag.'SORT'};
# only store EXTID for track artist, as we don't have it for other roles
my $extid = ($tag eq 'ARTIST' && $attributes->{'ARTIST_EXTID'}) ? $attributes->{'ARTIST_EXTID'} : undef;
Comment on lines +3293 to +3294
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an assumption which might be valid now and for some services. But this can change at any time, if it's even true now. Wouldn't Qobuz import different roles, too?

We should probably review even the existing implementation...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what the streamer plugins do or don't provide.

I could drop the assumption and change the code to:
my $extid = $attributes->{ $tag . '_EXTID' }; ?


# When scanning, short-circuit repeated contributor resolution for identical inputs.
if (main::SCANNER) {
# Defensive: ensure the scan-local cache is initialized.
# init() should have done this, but this keeps the optimization safe
# if call order ever changes.
Slim::Schema->_initScanContributorCache unless $_scanContributorAddCacheTied;

my $brainzKey = ref($brainzID) eq 'ARRAY' ? join("\x1e", @{$brainzID}) : ($brainzID // '');
my $sortKey = ref($sortBy) eq 'ARRAY' ? join("\x1e", @{$sortBy}) : ($sortBy // '');
my $extKey = ref($extid) eq 'ARRAY' ? join("\x1e", @{$extid}) : ($extid // '');

my $cacheKey = join("\x1f",
$contributor,
$brainzKey,
$sortKey,
$extKey,
);

my $cachedIds = $_scanContributorAddCache{$cacheKey};
if ($cachedIds) {
if (main::DEBUGLOG) {
$_scanContributorCacheHitKeys++;
$_scanContributorCacheHitIds += scalar @{$cachedIds};
$_scanContributorCacheHitIdsWithMbid += scalar @{$cachedIds} if ($brainzKey ne '');
}
push @{ $contributors{$tag} }, @{$cachedIds};
}
else {
if (main::DEBUGLOG) {
$_scanContributorCacheMissKeys++;
}
my @ids = Slim::Schema::Contributor->add({
'artist' => $contributor,
'brainzID' => $brainzID,
'sortBy' => $sortBy,
'extid' => $extid,
});
if (main::DEBUGLOG) {
$_scanContributorCacheMissIds += scalar @ids;
$_scanContributorCacheStoreKeys++;
$_scanContributorCacheStoreIds += scalar @ids;
if (defined $_scanContributorAddCacheMaxEntries && $_scanContributorCacheApproxSize < $_scanContributorAddCacheMaxEntries) {
$_scanContributorCacheApproxSize++;
}
}

$_scanContributorAddCache{$cacheKey} = [ @ids ];
push @{ $contributors{$tag} }, @ids;
}
}
else {
push @{ $contributors{$tag} }, Slim::Schema::Contributor->add({
'artist' => $contributor,
'brainzID' => $brainzID,
'sortBy' => $sortBy,
'extid' => $extid,
});
}

main::DEBUGLOG && $isDebug && $log->is_debug && $log->debug(sprintf("-- Track has contributor '$contributor' of role '$tag'"));
}
Expand Down