any way to turn a pdf file into something OCR-able?
rsmith at xs4all.nl
Mon Dec 1 17:07:33 PST 2008
On Mon, Dec 01, 2008 at 03:14:43PM -0800, Gary Kline wrote:
> pdftotext fail on the large [32MB] file I've got. Is there any
> other way I can translate this huge textfile to ascii or html or
Please define "fail" in this context? I've used pdftotxt on documents
exceeding 40MB. However there are of course things that don't work;
1) Some PDFs are just wrappers around JPEG images. In this case there is
no text for pdftotext to convert => epic fail.
2) If the text contains ligatures etc. you should use the proper
encoding that contains such characters (e.g. '-enc UTF-8') or you will
3) Things like equations will not render well, if at all. This also
depends on the encoding.
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 195 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20081202/39d11e5c/attachment.pgp
More information about the freebsd-questions