pdf edit again.
cpghost
cpghost at cordula.ws
Sat Nov 3 22:03:21 PDT 2007
On Sat, 3 Nov 2007 17:54:53 -0800
Gary Kline <kline at tao.thought.org> wrote:
> On Sun, Nov 04, 2007 at 02:39:14AM +0100, cpghost wrote:
> > On Sat, 3 Nov 2007 16:38:55 -0800
> > Gary Kline <kline at tao.thought.org> wrote:
> >
> > > A couple weeks ago I skimmed thru the postings on editing
> > > PDF files. Wasn't entirely clear what the answer it because I
> > > never thought I would need to edit a GUI file. I just found a
> > > book from 1883 in pdf format. I would like a
> > > text/ASCII/ISO_8859-1 version. Tried pfdtotext, but it doesn't
> > > work. Nutshell: is there something I can use to edit/look-at
> > > this book and get rid of whateveriit is that's causing pdftotext
> > > to fail. (sorry for the grammar.... )
> >
> > Old books in PDF are normally scanned bitmaps. There are no
> > characters or whatever therein; just pixels (EPS files). If you
> > want to convert that to ASCII, you'd need to extract the EPS files
> > (use something like pdfimages from the xpdf port), turn them into
> > some bitmap format, and run some kind of OCR software on that. It's
> > a slow, unreliable, error-prone and painful process though.
> >
> > Good luck!
>
>
> "Arrrgh" (Charlie Brown). If it's that tortured, I'll forget
> it; thanks for the clue. Pretty sure this *was* just phot'd
> and scanned in.
>
> (Much be how amazon.com has thir zillions of boooks online.
> OCR'ing is serious work; I know that first hand.)
If you need help on imperfectly OCR'ed texts, esp. on texts that
are no longer copyrighted, there's always Distributed Proofreaders
from the venerable Project Gutenberg: http://www.pgdp.net/
Good luck!
-cpghost.
--
Cordula's Web. http://www.cordula.ws/
More information about the freebsd-questions
mailing list