editing pdf files
Gary Kline
kline at thought.org
Sat Oct 13 20:44:58 UTC 2012
On Sat, Oct 13, 2012 at 01:19:07PM +0200, Polytropon wrote:
> On Fri, 12 Oct 2012 16:46:28 -0700, Gary Kline wrote:
> > ive got a question that fits in here. hopefully.
> >
> > last week I found a book from 1901 that google had scanned and listed
> > as a pdf file. it was text plus photos of the rich/famous of the
> > 1800s. somehow, google found the exact string that matched my great
> > grandfather [from the civil war]. I d'loaded the file (maybe 2mbytes)
> > and searched using acroread. nada. I used the pdftotext utility.
> > same: nothing but some 600 page numbers.
> >
> > my guess is that google just took photos of the book and used other
> > tools to create a pdf file. I am not =that= serious about genealogy,
> > but I would like to know if there are any tools to edit this kind of
> > pdf file.
>
> In case the PDF is nothing more than a compilation of images,
> there's a way to deal with it for editing:
the images in this book aren't what I am interested in.
just text.
>
> step 1: disassemble
> step 2: edit images
> step 3: reassemble
>
> The disassembling can be done with
>
> % pdfimages source.pdf .
>
> Then the files can be edited whatever tool you like, e. g. Gimp.
> They often come out in PBM format.
>
> Finally the images can be re-converted to PDF and combined to one
> PDF file:
>
> for IMG in .*.pbm; do
> convert ${IMG} ${IMG}.pdf
> done
> pdftk .*.pdf output target.pdf
>
> Note the ".*" prefix for the file specification: The images extracted
> by pdfimages match that pattern (at least in the case I tested it for).
> If they get other names than .0000001.pbm, change the approach
> accordingly.
>
turns out that the first roughtly 580 pages are of no interest.
I'll see if tesseract-ocr can get rid of most of the data.
what fmt works best with the ocr suites? or are they about the
same? for the section I got in that 1901 book on my g-grandfather,
it was only about 1.5 pages. there was no photo, just his name
and some bio. Still, things I had no knowledge of. I'm sure
that my father didnt know either!
gary
>
>
> --
> Polytropon
> Magdeburg, Germany
> Happy FreeBSD user since 4.0
> Andra moi ennepe, Mousa, ...
--
Gary Kline kline at thought.org http://www.thought.org Public Service Unix
Twenty-six years of service to the Unix community.
More information about the freebsd-questions
mailing list