Naive TimeMap counting algorithm causes UI to display possibly inaccurate count #283

ibnesayeed · 2017-12-15T15:45:39Z

Line 397 in 76a6d4d

mCount = out.count("memento")

This rudimentary approach of counting the occurrence of the term memento in the response could result in wrong memento count when the term appears in URI-Ms. This could happen for two reasons:

the URI-R contains the term such as mementoweb.org
some archives have memento in their path such as /memento/<datetime>/<URI-R>

A more reliable approach would be to count it when the TM is processed for counting archives. Currently the TM is being processed twice, which can be slow for big TMs.

The text was updated successfully, but these errors were encountered:

machawk1 · 2017-12-15T16:30:09Z

Nice catch. This is a good candidate for a test case.

Also a good idea to reuse the archive counting process (new as of < 12 hours ago, so pardon the delay ;)) for counting URI-Ms.

machawk1 · 2017-12-15T17:21:42Z

@ibnesayeed What are your thoughts on using CDXJ then performing one of the following to obtain the count?

Subtract metadata lines count from line count of CDXJ TM
Increment a counter for each lines that starts with a [0-9], indicative of it being a memento.

This would allow us to exploit the features of CDXJ but still incur the temporal expense of converting from a Link to a CDXJ in MemGator, for which a Link-based solution might be more efficient (at the cost of parsing the rel).

ibnesayeed · 2017-12-15T17:22:44Z

A PR is on its way. :)

machawk1 · 2017-12-15T17:27:01Z

Ah, ok. In the future, if you are working on a ticket, let me know and I can assign it to you so we don't waste work cycles (I was working on a solution as well). I'll defer to your upcoming PR.

ibnesayeed · 2017-12-15T17:39:23Z

Ah, ok. In the future, if you are working on a ticket, let me know and I can assign it to you so we don't waste work cycles (I was working on a solution as well). I'll defer to your upcoming PR.

I thought I did, but apparently I forgot to mention here. Anyways, the PR #284 is in now for review.

machawk1 added the bug label Dec 15, 2017

machawk1 added this to the Non-critical Fixes & Features (Future Release) milestone Dec 15, 2017

machawk1 changed the title ~~Wrong memento count~~ Naive TimeMap counting algorithm causes UI to display possibly inaccurate count Dec 15, 2017

machawk1 assigned ibnesayeed Dec 15, 2017

ibnesayeed mentioned this issue Dec 15, 2017

A more efficient and accurate memento and archive counts #284

Merged

machawk1 closed this as completed in #284 Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naive TimeMap counting algorithm causes UI to display possibly inaccurate count #283

Naive TimeMap counting algorithm causes UI to display possibly inaccurate count #283

ibnesayeed commented Dec 15, 2017

machawk1 commented Dec 15, 2017

machawk1 commented Dec 15, 2017

ibnesayeed commented Dec 15, 2017

machawk1 commented Dec 15, 2017

ibnesayeed commented Dec 15, 2017

Naive TimeMap counting algorithm causes UI to display possibly inaccurate count #283

Naive TimeMap counting algorithm causes UI to display possibly inaccurate count #283

Comments

ibnesayeed commented Dec 15, 2017

machawk1 commented Dec 15, 2017

machawk1 commented Dec 15, 2017

ibnesayeed commented Dec 15, 2017

machawk1 commented Dec 15, 2017

ibnesayeed commented Dec 15, 2017