editing pdf files

Sat Oct 13 02:40:24 UTC 2012

On Sat, Oct 13, 2012 at 1:46 AM, Gary Kline <kline at thought.org> wrote:
> On Fri, Oct 12, 2012 at 10:40:29PM +0400, Boris Samorodov wrote:
>> 10.10.2012 02:35, Gary Aitken пишет:
>>
>> > Can someone give me advice on editing pdf files?
>>
>> Take a look at graphics/inkscape.
>>
>> --
>> WBR, Boris Samorodov (bsam)
>> FreeBSD Committer, http://www.FreeBSD.org The Power To Serve
>
>
>         ive got a question that fits in here.  hopefully.
>
>         last week  I found a book from 1901 that google had scanned and listed
>         as a pdf file.  it was text plus photos of the rich/famous of the
>         1800s.  somehow, google found the exact string that matched my great
>         grandfather [from the civil war].  I d'loaded the file (maybe 2mbytes)
>         and searched using acroread.  nada.  I used the pdftotext utility.
>         same: nothing but  some 600 page numbers.
>
>         my guess is that google just took photos of the book and used other
>         tools to create a pdf file.  I am not =that= serious  about genealogy,
>         but I would like to know if there are any tools to edit this kind of
>         pdf file.

I suspect the following: they scanned the book and put all the images
into the PDF. The PDF itself is merely a container for scanned pages;
it thus contains no text (save for the page numbers).

That Google was able to search in this file is probably due to them running
some OCR program on the image files, and then indexing the (approximate)
text that the OCR program generated. Probably they used something like
tesseract-ocr from ports graphics/tesseract:
  http://code.google.com/p/tesseract-ocr/

>         tia guys,
>
>         gary
>
>
> --
>  Gary Kline  kline at thought.org  http://www.thought.org  Public Service Unix
>               Twenty-six years of service to the Unix community.

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/