-
Notifications
You must be signed in to change notification settings - Fork 344
Scanner: cache contributor resolution during scans #1476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: public/9.1
Are you sure you want to change the base?
Scanner: cache contributor resolution during scans #1476
Conversation
Add scan-local LRU+TTL cache and DEBUGLOG stats to reduce redundant contributor DB work during library scans.
michaelherger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep things simple. IMHO this is way over-engineered, with most of the code satisfying the dev's geek mode only, adding no value to the user.
| if ( !main::SCANNER ) { | ||
| require Slim::Control::Request; | ||
| # Wipe cached data after rescan | ||
| Slim::Control::Request::subscribe( sub { | ||
| $class->wipeCaches; | ||
| Slim::Schema::Album->addReleaseTypeStrings; | ||
| }, [['rescan'], ['done']] ); | ||
| } | ||
| elsif ( main::DEBUGLOG ) { | ||
| require Slim::Control::Request; | ||
| # Initialize scan-local caches. | ||
| $class->_initScanContributorCache; | ||
| $class->_registerScanContributorCacheEndHook; | ||
|
|
||
| # Scanner-only stats, emitted once per scan if DEBUGLOG is enabled. | ||
| Slim::Control::Request::subscribe( sub { | ||
| $class->_logScanContributorCacheStats; | ||
| }, [['rescan'], ['done']] ); | ||
| } | ||
| elsif ( main::SCANNER ) { | ||
| require Slim::Control::Request; | ||
| # Initialize scan-local caches even if DEBUGLOG is off. | ||
| $class->_initScanContributorCache; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not going to fly. We don't want to use Slim::Control::Request in the scanner. You don't need to subscribe to those events in the scanner, do you? The scanner would already have exited when the server signalled "scanner done".
And having a different flow for debug mode sounds wrong to me, too. I believe in the above the elsif (main::SCANNER) would hardly ever be hit, as nobody is going to disable debug logging for the scanner manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
| # only store EXTID for track artist, as we don't have it for other roles | ||
| my $extid = ($tag eq 'ARTIST' && $attributes->{'ARTIST_EXTID'}) ? $attributes->{'ARTIST_EXTID'} : undef; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an assumption which might be valid now and for some services. But this can change at any time, if it's even true now. Wouldn't Qobuz import different roles, too?
We should probably review even the existing implementation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what the streamer plugins do or don't provide.
I could drop the assumption and change the code to:
my $extid = $attributes->{ $tag . '_EXTID' }; ?
| my %_scanContributorAddCache; | ||
| my $_scanContributorAddCacheTied = 0; | ||
| my $_scanContributorAddCacheMaxEntries; | ||
| my $_scanContributorCacheEndHooked = 0; | ||
| my $_scanContributorCacheApproxSize = 0; | ||
|
|
||
| sub _scanContributorCacheMaxEntries { | ||
| # Use the existing cross-platform `dbhighmem` preference as a proxy | ||
| # Keep this deliberately bounded: on low-RAM systems, very large hash-based | ||
| # caches can become noticeable during big scans. | ||
| return $prefs->get('dbhighmem') ? 50000 : 15000; | ||
| } | ||
|
|
||
| sub _initScanContributorCache { | ||
| my $class = shift; | ||
| return unless main::SCANNER; | ||
| return if $_scanContributorAddCacheTied; | ||
|
|
||
| my $entries = $class->_scanContributorCacheMaxEntries; | ||
| $_scanContributorAddCacheMaxEntries = $entries; | ||
| $_scanContributorCacheApproxSize = 0; | ||
| tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires', EXPIRES => 3600, ENTRIES => $entries; | ||
| $_scanContributorAddCacheTied = 1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this complicated setup? Why not just initialise the cache like we do for the content type? A line or two instead of dozens of them and repeated check for whether it's been initialised or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can simplify, but before I do, is sizing the cache according to the user's selected preference in Database Memory Config something you think should be lost or retained? To my mind limiting the cache size is necessary given Lyrion is expected to run on SOC with < 1GB RAM. On the other hand limiting the cache size to satisfy only RAM deficient (mostly legacy) SOCs would significantly limit the benefit of including the cache in the first place.
I had toyed with being more direct around cache size based on installed RAM but that would mean adding additional modules, which I figured you'd be against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally reasonable. But this doesn't need much code. See eg. https://github.com/LMS-Community/slimserver/blob/public/9.1/Slim/Control/Queries.pm#L69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This work for you?
# Scan-local contributor cache.
tie %_scanContributorAddCache, 'Tie::Cache::LRU::Expires',
EXPIRES => 3600,
ENTRIES => ($prefs->get('dbhighmem') ? 50000 : 15000)
if main::SCANNER;
| # Optional instrumentation (scanner + DEBUGLOG only) | ||
| # These counters are used to quantify effectiveness and provide a conservative | ||
| # estimate of avoided DB statements. | ||
| my ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much noise for my taste. Who'd be interested in having this other than you as the developer? That's only adding overhead and code for little to no use for the user. Please remove this. You can still have those lines in your clone. But why should we have that much code to satisfy the geek in me for this one particular cache, when we don't do any of this anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I specifically included it as a temporary aid to help you assess whether or not the cache has sufficient impact to warrant the PR.
During scans we repeatedly resolve identical contributor strings; Slim::Schema::Contributor->add() can do multiple DB round-trips per call.