Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

Paperwork limitation: Big documents (> 100 pages) #782

Open
kafran opened this issue Jun 12, 2018 · 8 comments
Open

Paperwork limitation: Big documents (> 100 pages) #782

kafran opened this issue Jun 12, 2018 · 8 comments
Labels
Milestone

Comments

@kafran
Copy link

kafran commented Jun 12, 2018

Guys, what are the limitation of Paperwork? Right now I have 4 documents with approximately 200 pages each and Paperwork is uselessly slow. I just can't use the software. When I add each doc my computer went to its knees. I have a Intel i5 with 8GB ram, if I add a 200 page scanned doc to Paperwork my ram goes to 7.33GB Used and more then 8GB Swap. After processing all the docs, It's impossible to search and use paperwork, it takes too long to process search and load pages. Maybe because I'm running it through flatpak?

I really liked the idea behind paperwork, I usually scan docs to .tiff; I liked the idea of having the docs organized as images with OCR and to convert it to PDF with configurable quality as necessary. Does anybody knows another app which could help me on this task until Paperwork solves its optimizations problems? I don't care for all the fancy animations, etc. I just need a software to tag and manage scanned documents and export to PDF when needed.

@tiramiseb
Copy link
Collaborator

Hello,

I currently have 1486 documents, the larger one with 111 pages, but mostly 1 or 2 pages... While not being hyper-quick, paperwork is fast enough to be usable...

@tYYGH
Copy link

tYYGH commented Jun 12, 2018

I can confirm that Paperwork is not optimized for big documents. My stats are such:
— 3 documents with >100 pages (max 158 pages)
— 63 documents with 16–99 pages (evenly distributed along the range)
— 102 documents with 7–15 pages (evenly distributed along the range)
— 118 documents with 5–6 pages
— 2437 documents with 1–4 pages (decreasing, starting with half of these having only 1 page)

With these stats, Paperwork remains usable. But whenever I open one of the 3 big documents, I get a temporary freeze…

@tiramiseb
Copy link
Collaborator

I think it is clear that paperwork is not meant for big documents...

@jflesch
Copy link
Member

jflesch commented Jun 12, 2018

Full disclosure: The biggest document I've used to test Paperwork is about ~100 pages. And it is a test document, not one that I really use day-to-day :/

@jflesch jflesch added the optim label Jun 12, 2018
@jflesch jflesch added this to the 2.0 milestone Jun 12, 2018
@jflesch jflesch changed the title Paperwork limitations Paperwork limitation: Big documents (> 100 pages) Jun 12, 2018
@jflesch
Copy link
Member

jflesch commented Jun 12, 2018

@kafran : By the way, did you import those big documents as PDF, or did you scan them ?

Regarding your question about other applications, unfortunately, I don't know any that is opensource and does exactly what Paperwork does (I wouldn't be working on it otherwise ;).

However, you may want to have a look at some web applications doing similar things. For instances:

@kafran
Copy link
Author

kafran commented Jun 12, 2018

@jflesch thank you for paperwork, its a great piece of software. I posted this on the hope someone could help me how to get things running more smoothly. Or for another solution with a faster search and visualization capability. All documents I'm scanning I often need to retrieve information on it.

All documents I'm putting on paperwork I scanned myself with a Kodak ScanMate i1150. First I scan using this script https://gist.github.com/kafran/46b1d798cef7b3aa48e9a138f99902cf because the scanner I'm using is capable of detect and exclude blank pages and then I import it to paperwork with the "Import image folder" option.

I don't know how paperwork could get more resource efficient. If I have not 8GB ram and 16GB Swap partition it wouldn't be impossible for me to use Paperwork.

@jflesch
Copy link
Member

jflesch commented Jun 12, 2018

Once I'm done with libinsane, I'll work on rewriting / rearranging Paperwork. The main goal will be to get the code more modular, but my hope is that it will help testing and help isolate and fix issues (bugs but optimization issues as well).

While I can't do much for you right now regarding Paperwork, I would appreciate it if you could submit a test scan report for the scanner database on openpaper.work : https://openpaper.work/en/scanner_db/#contribute .
I have no Kodak scanner currently in the database. I would be curious to see what other options it can provide.

@kafran
Copy link
Author

kafran commented Jun 12, 2018

Sure. I would be glade to do that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants