42

I want to convert a .pdf file to an .odt file so that I can further convert it to a .doc file. Is there any software/script that can do this? I have tried to copy the content of the .pdf file and paste it into LibreOffice Writer but the formatting isn't preserved.

The document is confidential so I'd prefer not to use any online service for the conversion.

Ankit
  • 6,989

5 Answers5

20

You could take a look at PDF Utilities (poppler-utils via Synaptic or apt-get) which includes pdftotext:

Poppler is a PDF rendering library based on Xpdf PDF viewer.

This package contains command line utilities (based on Poppler) for getting information of PDF documents, convert them to other formats, or manipulate them:
* pdfdetach -- lists or extracts embedded files (attachments)
* pdffonts -- font analyzer
* pdfimages -- image extractor
* pdfinfo -- document information
* pdfseparate -- page extraction tool
* pdftocairo -- PDF to PNG/JPEG/PDF/PS/EPS/SVG converter using Cairo
* pdftohtml -- PDF to HTML converter
* pdftoppm -- PDF to PPM/PNG/JPEG image converter
* pdftops -- PDF to PostScript (PS) converter
* pdftotext -- text extraction
* pdfunite -- document merging tool

Of course, success will depend on how the pdf file was generated. If you get what you want as a text file, you could then save that as an .odt file.

Edit: I forgot to provide the source for the quote. It's from the description tab in Synaptic for PDF Utilities (based on Poppler).

16

I was annoyed by the lack of a free PDF to ODT converter too. I didn't even need anything complicated. Just a tool that generates ODT files that I can then annotate in LibreOffice (e.g. to fill out forms).

I know how to do this manually, by converting the PDF document into graphics files and then importing them into LibreOffice, but that gets tedious quite fast.

So, I finally wrote a quick little shell script that does all the required steps automatically. You can find it at https://github.com/gutschke/pdf2odt

It can take any number of PDF and image files as input and generates a ODT file that can be opened and edited in LibreOffice. Images show up as page background, so you can write over them freely. Each image is associated with its own page style. Keep that in mind, when inserting page breaks and adjust the page style as necessary.

I tested the script on both Linux and Mac. Given that it only needs a handful of reasonably standard tools, it should be quite portable.

gutschke
  • 169
12

LibreOffice is capable of importing .pdf files. Simply open it in a current version of LibreOffice for best results. It will, however, open the document as a drawing, and you will be able to convert it only to one of the supported image formats, not as a Writer document.

Naturally, not all formatting is preserved, but at least some.

bender
  • 1,854
8

Try Calibre. It converts to html and then into other formats. It did a pretty good job on a large (183 pages) file I would have otherwise had to print.

In my case I converted it to an epub, but for fun just converted it to a .docx which turned out very well.

3

If the poppler-utils package is installed, a file manager script including a command like the one below will help convert PDF file to HTML (the option "-i" can be deleted to include images as well), which can then be opened with LibreOffice Writer and saved as ODT although the success of formatting conversion depends very much on how PDF is created.

pdftohtml -s -c -i -p -nodrm -noframes <filename>
Sadi
  • 11,074