Convert PDF to Excel
Polytropon
freebsd at edvax.de
Sat Jan 23 10:14:46 UTC 2021
On Sat, 23 Jan 2021 09:04:21 +0000, Steve O'Hara-Smith wrote:
> On Sat, 23 Jan 2021 09:40:41 +0100
> Polytropon <freebsd at edvax.de> wrote:
>
> > They contain text, so the OCR problem is out of the way.
> > Sadly, the text is re-arranged so the optimal solution (one
> > line in a table equals one line of text, with the columns
> > being separated by whitespace) does not appear, instead it
> > is the other way round: one line equals one column.
>
> I spy a fun interview question buried in this problem - flipping a
> text file like that efficiently is far from easy - dead easy if you
> don't mind eating memory of course.
The lesson to learn for this potential interview question
simply is RTFM; from "man pdftotext": -layout will try its
best to preserve the original display in the raw output.
So data that is in lines, but arranged to columns, will
then be output as columns; each "dataset" is one line.
--
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
More information about the freebsd-questions
mailing list