Skip to content

citr taking long time to access Zotero database with large database #1391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jrennstich opened this issue Jan 12, 2020 · 34 comments
Closed

citr taking long time to access Zotero database with large database #1391

jrennstich opened this issue Jan 12, 2020 · 34 comments
Labels

Comments

@jrennstich
Copy link

jrennstich commented Jan 12, 2020

citr is taking a considerably long time (90-100 sec) to connect to the Zotero database with my large database.

Report ID:
WF6BPCRX-euc

@label-gun
Copy link

label-gun bot commented Jan 12, 2020

It looks like you did not upload an debug report. The debug report is important; it gives @retorquere your current BBT settings and a copy of the problematic reference as a test case so he can best replicate your problem. Without it, @retorquere is effectively blind. Debug reports are useful for both bug analysis and enhancement requests; in the case of export enhancements, I need the copy of the references you have in mind.

  • If your issue relates to how BBT behaves around a specific reference(s), such as citekey generation or export, select at least one of the problematic reference(s), right-click it, and submit an BBT debug report from that popup menu. If the problem is with export, please do include a sample of what you see exported, and what you expected to see exported for these references.

  • If the issue does not relate to references and is of a more general nature, generate an debug report by restarting Zotero with debugging enabled (Help -> Debug Output Logging -> Restart with logging enabled), reproducing your problem, and selecting "Send Better BibTeX debug report..." from the help menu.

Once done, you will see a debug ID in red. Please post that debug id in the issue here.

Thank you!

@retorquere
Copy link
Owner

retorquere commented Jan 12, 2020

I see 3 requests for the full library in that log, and some pings to see if BBT is available. Even assuming #1389 fixed, that would still mean 5-10 seconds to complete for those full-library requests. Do you have an idea what you did that may trigger citr to re-fetch the library?

@jrennstich
Copy link
Author

Nope. Simply clicked on "reconnect" once.

@retorquere
Copy link
Owner

Then what citr is doing seems a little excessive for large libraries. I'm going to think of a way that citr can check whether what it wants to download has changed since it last fetched it.

@retorquere
Copy link
Owner

I've opened an issue over at citr to coordinate at crsh/citr#58. I'll need your presence on this since I'm not a citr user.

@jrennstich
Copy link
Author

I have subscribed.

@jrennstich
Copy link
Author

Report ID: LGZGM3UI-euc

@ukuvainik
Copy link

I have a similar issue. Loading Zotero from citr takes ~2 minutes, also when reupdating.

A typical use case is that I am writing in Rmarkdown. I then realise i need to cite a paper not in my library. I add a paper Zotero via browser. To see that paper in citr, I need to reconnect the Zotero library, which means 2 minutes of waiting to use R.

My library is 4100 items.

@blip-bloop
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 5.2.14.5836 ("test file existence only when needed")

Install in Zotero by downloading test build 5.2.14.5836, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

@retorquere
Copy link
Owner

@ukuvainik can you do the following:

  • install 5836 and enable debug logging from the help menu
  • connect citr to Zotero
  • stop rstudio and start it again
  • connect citr to Zotero again
  • send a BBT debug log from the help menu

@retorquere
Copy link
Owner

retorquere commented Feb 5, 2020

(same goes for @jkr - 5836 has a tweak that makes exports like citr requests more efficient, assuming a filled cache)

@ukuvainik
Copy link

thanks for the effort, Report ID: ZSGXIMTV-euc

@retorquere
Copy link
Owner

I see 2 exports in that log:

  • 2 items, total duration 0.18s
  • 4134 items, total duration 5.451s

so if citr takes considerably more time than 6s, it's either something citr is doing that doesn't involve BBT, or citr requests something more of BBT, but then I'm not seeing that in the log.

@ukuvainik
Copy link

I see 2 exports in that log:

  • 2 items, total duration 0.18s
  • 4134 items, total duration 5.451s

so if citr takes considerably more time than 6s, it's either something citr is doing that doesn't involve BBT, or citr requests something more of BBT, but then I'm not seeing that in the log.

Ok, I will ask citr about this. Thank you for looking into this. crsh/citr#64

@retorquere
Copy link
Owner

crsh/citr#64 might be a duplicate of crsh/citr#58

@blip-bloop
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 5.2.14.5837 ("speed up CSL exports")

Install in Zotero by downloading test build 5.2.14.5837, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

@jrennstich
Copy link
Author

(same goes for @jkr - 5836 has a tweak that makes exports like citr requests more efficient, assuming a filled cache)

I am on it. This takes much longer in my case...so I won't break up the current attempt and install 5837 after that. Will send two different reports.

retorquere added a commit that referenced this issue Feb 5, 2020
@retorquere
Copy link
Owner

retorquere commented Feb 5, 2020

phew 😥

BE86BA4X-euc: 256.57s (biblatex, cold), 467.05s (bibtex, cold), 27.594s (bibtex, hot)
AQPF4PGY-euc: 226.216s (biblatex, cold), 575.543s (bibtex, cold), 12.318s (bibtex, hot)

I'm going to rig something together so you can view these yourself.

@jrennstich
Copy link
Author

Oh wait - also on the 2nd run? Unless you have "retain cache" on, installing this build would have dropped the export cache.

No, I meant connecting to citr, then Zotero. That takes me usually more than 180 sec when (re)connecting for first time.

@retorquere
Copy link
Owner

So assuming you're connecting with a hot cache, what is it doing the other 150 seconds?

@jrennstich
Copy link
Author

beats me

@blip-bloop
Copy link
Collaborator

🤖 this is your friendly neighborhood build bot announcing test build 5.2.14.5851 ("the beta shuffles the extra field...")

Install in Zotero by downloading test build 5.2.14.5851, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

@jrennstich
Copy link
Author

Installed it last night, no major changes. I will post a report later.

@retorquere
Copy link
Owner

There are only minor changes in this version. It should provide a very modest speedup, but mostly just discloses the stats. I'm still working on an R script to fetch and visualize them. Such a horrid language, R, but easy to make pretty graphs with once you have the data.

@jrennstich
Copy link
Author

Report ID: XLQKMMKQ-euc
With citr
Report ID: 4L5IRCPW-euc

@retorquere
Copy link
Owner

retorquere commented Feb 7, 2020

Neither of those reports show an export being ran. You can look at these yourself now (doesn't require debug logging on), but man I hope someone around here is better with R than me:

library(jsonlite)
library(RCurl)
stats = fromJSON(getURL('http://127.0.0.1:23119/better-bibtex/translations/stats'))

show <- function(n) {
  plot(
    c(stats$prep$duration[[n]], stats$export$duration[[n]]),
    ylab = "duration",
    main = paste(stats$translator[n], stats$items[n], ((stats$cached$serializer[n] + stats$cached$export[n]) * 100) / (stats$items[n] * 2), "%", stats$prep$total[n] + stats$export$total[n], "ms"),
    col = c(rep('red', length(stats$prep$duration[[n]])), rep('blue', length(stats$export$duration[[n]])))
  )
}

show(1)

@jrennstich
Copy link
Author

Thanks for sending the R script. What would you like me to run? A cold export followed by a hot one? Or rather simply using citr?

@retorquere
Copy link
Owner

You can use this script (or any improvement you make on it) to peer inside the export performance, which will be helpful once the citr side gets going (for one thing, you wouldn't need to send debug logs and install custom builds). But I think I've wrangled pretty much all speedup I could out of this, and if our earlier measurements are correct, BBT accounts for ~9% of the time of the data exchange. Effort seems at this stage better spent improving on the other 91%.

I'm ready to merge these changes into a new release, but I genuinely don't see what else I could do at this point.

@jrennstich
Copy link
Author

Again: thanks so much. This has been very helpful!

@retorquere
Copy link
Owner

My pleasure. I'm running a few more tests tonight to see if I can get juris-m beta running again, and then I'll cut a new release. For citr, we'll have to wait for activity on crsh/citr#58.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants