can i split a pdf file?

Polytropon freebsd at edvax.de
Mon Jan 26 15:51:33 PST 2009


On Mon, 26 Jan 2009 14:51:14 -0800, Gary Kline <kline at thought.org> wrote:
> Still,
> 	before I get back to the Last few pages of my thesis, maybe I'll
> 	try feeding parts of my most vanilla image-PDF file to an
> 	opensource OCR program.  I'm pretty sure there are a couple in
> 	ports.  IIRC, though, the images have to be jpegs of tiffs or the
> 	like.  If anybody knows, please give me a shout out!

The best idea is to use a format that does not have artifacts
due to image compression through DCT or similar algorithms,
read: "real black-white pictures" (1 bit color). JPEG is not
such a format, you can see this by magnifying the surrounding
of text: it is gray and looks "dusty".

TIFF, GIF and PNG surely are better formats for feeding images
into an OCR processor.

(Background: Long time ago, I knew a man who did electronics
and printed circuit boards. In order to save hard disk space,
he converted his 1-bit BMP images of the schematics and the
PCB layout to JPEG format - instead of just zipping, raring
or arjing them. He was very unhappy to see them coming out
of the printer "so dirty, partially unreadable" then allthough
it was a high quality office class laser printer. And when
he took the PCBs out of the acid bath, their previously
photochemical treated surface looked strange, had holes in
the copper, ready to be thrown away. This man was very upset
when he was told about DCT and artifacts. Later on, he used
GIF images and turned happy again.)




-- 
Polytropon
>From Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...


More information about the freebsd-questions mailing list