Olivier Nicole on at cs.ait.ac.th
Wed Jun 10 02:08:32 UTC 2009


> I'm trying to convert all PDF files in a directory to text using
> "pdftotext".  I tried the following command:

Aside from the syntax of the command find(1) and some article that may
be in corrupted PDF, you may consider hacking pdftotext to skip the
"do not print" flag in some of the PDF articles.

I don't think that many scientific articles would set the flag that
prevent from printing them. But some PDF filess have that flag set,
and pdftotext would not work on them, unless you patch it (which is
easy, could even be a compile option, I don't remember).

