pdf edit again.

Sat Nov 3 22:03:21 PDT 2007

On Sat, 3 Nov 2007 17:54:53 -0800
Gary Kline <kline at tao.thought.org> wrote:

> On Sun, Nov 04, 2007 at 02:39:14AM +0100, cpghost wrote:
> > On Sat, 3 Nov 2007 16:38:55 -0800
> > Gary Kline <kline at tao.thought.org> wrote:
> > 
> > > 	A couple weeks ago I skimmed thru the postings on editing
> > > PDF files.  Wasn't entirely clear what the answer it because I
> > > never thought I would need to edit a GUI file.  I just found a
> > > book from 1883 in pdf format.  I would like a
> > > text/ASCII/ISO_8859-1 version.  Tried pfdtotext, but it doesn't
> > > work.   Nutshell: is there something I can use  to edit/look-at
> > > this book and get rid of whateveriit is that's causing pdftotext
> > > to fail.  (sorry for the grammar.... )
> > 
> > Old books in PDF are normally scanned bitmaps. There are no
> > characters or whatever therein; just pixels (EPS files). If you
> > want to convert that to ASCII, you'd need to extract the EPS files
> > (use something like pdfimages from the xpdf port), turn them into
> > some bitmap format, and run some kind of OCR software on that. It's
> > a slow, unreliable, error-prone and painful process though.
> > 
> > Good luck!
> 
> 
> 	"Arrrgh" (Charlie Brown).  If it's that tortured, I'll forget
> 	it; thanks for the clue.  Pretty sure this *was* just phot'd
> and scanned in.
> 
> 	(Much be how amazon.com has thir zillions of boooks online.
> 	OCR'ing is serious work; I know that first hand.)

If you need help on imperfectly OCR'ed texts, esp. on texts that
are no longer copyrighted, there's always Distributed Proofreaders
from the venerable Project Gutenberg: http://www.pgdp.net/

Good luck!
-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/