-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace json? #81
Comments
In addition:
|
For some other stuffs, I discovered that ujson has better performances. A quick test with rss2email shown me that I could get between 30 to 50 % improvements on the execution time of r2e. It requires an extra deps, but probably worth it. However, I still think that a binary format would be appropriated. Any other thoughts? |
You're right that r2e could be faster. But switching to a binary format would be a step backwards. The possibility to have a quick look/change to the database is a big advantage of r2e! A problem may be, that the ujson package may be not available as repository package and has to be installed manually. F.e.: https://packages.debian.org/stretch/python3-ujson ...this is still in testing suite (+ not kept up to date). |
I'm running ujson without any problem for two months. |
I made some measurements using simplejson - which is normaly faster than the json encoder - and didn't notice a considerable runtime improvement (diff: 70ms). The guilty part doesn't seem to be the json handler itself... Did you make other experiences? |
I have 150 feeds monitored bby r2e and ~/.local/share/rss2email.json has about 8000 lines after fresh run. It takes many minutes and few hundred megabytes of memory to complete a single run. Why JSON? It is completely inapropriate format. What we need here is a quick indexed storage. For example Berkeley DB or SQLite. Both can be viewed by existing tools so it is not hard to debug. Libraries are production-ready and with many tutorials around. Use of such database would drop memory usage to few MB per run (database is mmapped instead of reading and parsing it) and since the database would be properly indexed, it would increase speed significantly. SQLite can do ten thousands inserts per second with no trouble. BDB is even few times faster. Retrieving data will be faster too, because index will be created once and then stored on disk along the data. Index lookup then does not have to load whole index to memory, it can just read few blocks here and there (because of mmap) to get requested records (single logarithmic-complexity search instead of multiple linear processings and then hash table lookup). Forget about binary json, it won't help. |
Hello, This repository has been deprecated for a few years now, and has been replaced by https://github.com/rss2email/rss2email . If this issue is still relevant to you, and not fixed with v3.12.2, could you please reopen the issue there? Cheers, |
Hi @wking and others.
As explained here: kurtmckee/feedparser#44 i made some investigations about performances and CPU consumption. I saw somebody complaining on the web about large CPU consumptions, and I have similar problems too.
In addition to what I reported above and the small enhancement I tried to make, it appears that a lot of cpu is used on save functions.
Is there any particular reason to choose json? I guess, but I've not tested that a binary format would be more efficient. I don't think anybody is going to read the json file anyway.
I think there is room for optimization here.
Any though?
The text was updated successfully, but these errors were encountered: