Linux: Extract text from PDF

1153125 pts.
Tags:
Linux
PDF
In Linux, does anyone know how I could extract text from a PDF file? Is there anything on the command line I can use? I really don't want to use any conversion software. Thanks!
1

Answer Wiki

Thanks. We'll let you know when a new response is added.
pdftotext that comes with poppler will try to extract any text found in the PDF.

It also lets you define a page region to extract text from.

Something like this:

pdftotext    \
  -f 5       \
  -l 7       \
  -x 200     \
  -y 700     \
  -W 144     \
  -H 80      \
   input.pdf \
   output.txt

Discuss This Question: 1  Reply

 
There was an error processing your information. Please try again later.
Thanks. We'll let you know when a new response is added.
Send me notifications when members answer or reply to this question.
  • Subhendu Sen
    There is a good and command lie tool called 'pdftotext' which is a part of poppler-utils package. By default it is installed, if you are using Ubuntu/Debian. If not found, you can download/ install this tool and can use for your purpose. The help page is here: http://manpages.ubuntu.com/manpages/bionic/man1/pdf2txt.1.html
    141,290 pointsBadges:
    report

Forgot Password

No problem! Submit your e-mail address below. We'll send you an e-mail containing your password.

Your password has been sent to:

To follow this tag...

There was an error processing your information. Please try again later.

Thanks! We'll email you when relevant content is added and updated.

Following

Share this item with your network: