Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input/Export from/to file and stdin/stdout #801

Open
matthiasbock opened this issue Feb 8, 2024 · 1 comment
Open

Input/Export from/to file and stdin/stdout #801

matthiasbock opened this issue Feb 8, 2024 · 1 comment
Assignees

Comments

@matthiasbock
Copy link

Hi,

Currently, when exporting PDF content it is only possible to specify the name of the directory
to which exported text files shall be written (outDir):

$ pdfcpu extract
usage: pdfcpu extract -m(ode) i(mage)|f(ont)|c(ontent)|p(age)|m(eta) [-p(ages) selectedPages] inFile outDir

It would be very useful if it were possible to specify filenames instead:

Export all PDF pages to one file:

$ pdfcpu extract -m content -o all_pages.txt some.pdf

Export one page to file:

$ pdfcpu extract -m content -p 1 -o page1.txt some.pdf

Export selected pages to the distinct files:

$ pdfcpu extract -m content -p 1 -o page1.txt -p 2 -o page2.txt some.pdf

Export selected pages to the same file:

$ pdfcpu extract -m content -p 1 -o pages1+3.txt -p 2 -o page2.txt -p 3 -o pages1+3.txt some.pdf

or

$ pdfcpu extract -m content -p 1,3 -o pages1+3.txt -p 2 -o page2.txt some.pdf

In particular, it would be useful, if stdin could be used to input a PDF file
and stdout to write the exported content.
This would enable PDF processing on the shell using pipes:

Read PDF input from stdin:

$ curl https://internet/some.pdf | pdfcpu extract -m content -o some_pages.txt -

Export text to stdout:

$ pdfcpu extract -m content -o - some.pdf | fgrep "Chapter 3:"

Best, Matthias

@hhrutter
Copy link
Collaborator

Hello!
Support for shell piping is a useful addition.
As far as your suggested addition to the extract command line processing I'd rather leave that up to the calling script.
I am also not in favour of using -o repeatedly within one command.
And if we're starting to use -o than that would have to change for all pdfcpu commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants