Optimise second-chained properties results, refs 3722 #5036
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is made in reference to: #3722
This PR addresses or contains:
The “prefetch cache” operation is now global: all prefetching is done at the first execution of
PrefetchCache::prefetch()
for all lines of the result, instead of one prefetching on each line.The real cache (PrefetchCache->cache) have keys "PropertyName" or "-PropertyName" and the values are arrays [ "semantic object ID" => [ list of WikiPages ] ] ; so that it is easy to pick values in PrefetchCache::getPropertyValues(). (I say WikiPages here and after, but it could be other data types.)
The second cache (PrefetchCache->lookupCache) is aligned with the first one; its keys are the whole property chain like "Prop1.-Prop2.Prop3…PropN-1" for the N-level chain (so that it is empty for first-level chain) and the values are the WikiPages of the result for this level N; this is mainly used to be able to compute the next level starting from this result.
Earlier versions of this PR failed on tests p-0434.json and p-0467.json due to the vicious repetition of a same property in the chain, which why the whole property chain must be cached somewhere (here in lookupCache) and not only the property.
On a request #ask with 999 results and two 2-levels chain properties (with no common property), it took 18.4 seconds before this patch and 3.7 seconds with this patch (mean over a sample of 5 reloads of the page; it is not impossible other factors influence the performance like MySQL internal cache, but the figures are anyway quite different). This was on SMW 3.2.2 + MW 1.33 + ElasticStore + PHP 7.3.
This PR includes: