Convert PDF to Excel

Polytropon freebsd at edvax.de
Sat Jan 23 04:42:14 UTC 2021


On Fri, 22 Jan 2021 19:45:11 +0300, Odhiambo Washington wrote:
> I have a situation where I'd like to convert PDF to XLSX.
> The documents are 35MB and 105MB but contain several thousand pages.
> 
> Does anyone know a good tool that can handle this?

Depends on what is in the PDFs.

If this is rendered text, you can maybe extract the text with
the tool pdftotext and convert it to CSV, then import the CSV
in "Excel".

But if it's images of text, use the tool pdfimages to extract the
images, and then a OCR tool (maybe esseract) to obtain the data.

It might be worth checking if LibreOffice an open a PDF file and
export to (or save as) directly an "Excel"-compatible file, either
CSV or one of the binary formats (XLS, XLSX).

Restructuring with some sed / awk / perl might be needed, though.
Keep in mind those steps can be automated, so if you have lots of
PDF files, write a simple shell wrapper that processes all of them,
so you get a bunch of result files without further handholding. :-)



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...


More information about the freebsd-questions mailing list