Input/Export from/to file and stdin/stdout #801

matthiasbock · 2024-02-08T17:37:27Z

Hi,

Currently, when exporting PDF content it is only possible to specify the name of the directory
to which exported text files shall be written (outDir):

$ pdfcpu extract
usage: pdfcpu extract -m(ode) i(mage)|f(ont)|c(ontent)|p(age)|m(eta) [-p(ages) selectedPages] inFile outDir

It would be very useful if it were possible to specify filenames instead:

Export all PDF pages to one file:

$ pdfcpu extract -m content -o all_pages.txt some.pdf

Export one page to file:

$ pdfcpu extract -m content -p 1 -o page1.txt some.pdf

Export selected pages to the distinct files:

$ pdfcpu extract -m content -p 1 -o page1.txt -p 2 -o page2.txt some.pdf

Export selected pages to the same file:

$ pdfcpu extract -m content -p 1 -o pages1+3.txt -p 2 -o page2.txt -p 3 -o pages1+3.txt some.pdf

or

$ pdfcpu extract -m content -p 1,3 -o pages1+3.txt -p 2 -o page2.txt some.pdf

In particular, it would be useful, if stdin could be used to input a PDF file
and stdout to write the exported content.
This would enable PDF processing on the shell using pipes:

Read PDF input from stdin:

$ curl https://internet/some.pdf | pdfcpu extract -m content -o some_pages.txt -

Export text to stdout:

$ pdfcpu extract -m content -o - some.pdf | fgrep "Chapter 3:"

Best, Matthias

hhrutter · 2024-02-25T10:41:13Z

Hello!
Support for shell piping is a useful addition.
As far as your suggested addition to the extract command line processing I'd rather leave that up to the calling script.
I am also not in favour of using -o repeatedly within one command.
And if we're starting to use -o than that would have to change for all pdfcpu commands.

matthiasbock added the feature request label Feb 8, 2024

matthiasbock assigned hhrutter Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input/Export from/to file and stdin/stdout #801

Input/Export from/to file and stdin/stdout #801

matthiasbock commented Feb 8, 2024

hhrutter commented Feb 25, 2024

Input/Export from/to file and stdin/stdout #801

Input/Export from/to file and stdin/stdout #801

Comments

matthiasbock commented Feb 8, 2024

hhrutter commented Feb 25, 2024