165

I need a command line tool for editing metadata of pdf-files.

I'm using a Aiptek MyNote Premium tablet for writing my notes and minutes on this device, import them later and convert them to pdf automatically with a simple script using inkscape and ghostscript.

Is there any command line tool to add some categories to the pdf's metadata, so i can find the pdf later (e.g. with gnome-do) by categories?

Update: I tried the solution with pdftk and it works, but it seems that gnome-do doesn't take care of pdf-metadata. Is there a way to get gnome-do to do that?

bdr529
  • 3,098

8 Answers8

203

Give exiftool a try; it is available from the package libimage-exiftool-perl in the repositories.

As an example, If you have a pdf file called drawing.pdf and you want to update its metadata, use the utility, exiftool, in this way:

exiftool -Title="This is the Title" -Author="Happy Man" -Subject="PDF Metadata" drawing.pdf

For some reason the Subject entered ends up in the keywords field of the metadata in the pdf file. not a problem in some cases, even desirable, however, this may be problematic: evince and the nautilus metadata previewer do not show this, but Adobe Acrobat viewer and PDF-XChange viewer do.

The program will create a backup of the original file if you do not use the -overwrite_original switch. This means a duplicate will exist in the folder where the updated pdf is. From the example above, a file named drawing.pdf_original will be created.

Use the overwrite switch at your own risk. My suggestion is not to use it and script something to move this file to a better location just in case.

waldyrious
  • 2,207
Sabacon
  • 41,474
24

You can edit PDF metadata using pdftk. Check out the update_info (or update_info_utf8 if you need accented characters) parameter. As for data file, below is an example:

InfoKey: Title
InfoValue: Mt-Djing: multitouch DJ table
InfoKey: Subject
InfoValue: Dissertation for Master degree
InfoKey: Keywords
InfoValue: DJing, NUI, multitouch, user-centered design
InfoKey: Author
InfoValue: Pedro Lopes

(Source)

waldyrious
  • 2,207
Olli
  • 9,129
14

Using Ghostview

Install ghostscript with:

$ sudo apt install ghostscript

Create a file named pdfmarks with similar content:

[ /Title (Document title)
  /Author (Author name)
  /Subject (Subject description)
  /Keywords (comma, separated, keywords)
  /ModDate (D:20061204092842)
  /CreationDate (D:20061204092842)
  /Creator (application name or creator note)
  /Producer (PDF producer name or note)
  /DOCINFO pdfmark

then combine this pdfmarks file with a PDF, PS or EPS input file:

gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=output.pdf original.pdf pdfmarks

Source: http://milan.kupcevic.net/ghostscript-ps-pdf/

muru
  • 207,228
Serge Stroobandt
  • 5,719
  • 1
  • 54
  • 59
10

To elaborate on the pdftk method, which is nice because it shows you everything that's being set, at the same time as allowing you to change anything you like, here is a script (for your .bashrc or other aliases file) to do it with one command. This creates a new version of the file you want to edit, opens your favourite editor with the metadatafile, and then implements your changes and sets the file creation/modification time on the modified PDF file to be the same as the original. To use it, after resourcing your .bashrc file, just type

editPDFmetadata myfile.pdf

Here's the alias:

editPDFmetadata() {
OUTPUT="${1}-new.pdf"
METADATA="tmp${1}-report.txt"
pdftk "${1}" dump_data output "$METADATA"
$EDITOR "$METADATA"
pdftk "${1}" update_info "$METADATA"  output "$OUTPUT"
touch -r "${1}" "${OUTPUT}"
}

Simply place the definition above into the .bashrc file in your home folder, then open a new terminal and it will be ready to use.

CPBL
  • 848
8

I needed to blank out the Author field in a PDF exported from LibreOffice. None of the solutions listed above worked for me, so I used hexedit and overwrote the Author field. Blunt instrument but effective!

In detail:

  1. Run:

    $ hexedit file.pdf
    
  2. Tab to switch to ASCII.

  3. Ctrl+S to search for "Author".

  4. Skip the <FEFF at the start of the field.

  5. Write 0 over all characters (except I preserved three 0x03 characters... YMMV) up to the closing >.

  6. Ctrl+X to save and exit.

6

I have extensively tested the functionality of pdftk and exiftool. I have used exiftool both at command line and through a graphical window. These have been tested for small, medium size and very large PDF documents and found to have issues with the largest and most complex PDF documents. In my experience, the pdftk / exiftool have top functionality only for small and for simple-in-formatting PDF documents. For large and complex PDF documents (eg more than 80 pages with multiple fonts) images and/or characters may fall out from the last pages after the metadata has been edited. The solution may be in the use of Ghostview, which I saw just now. No doubt these programs will improve with time.

In the meantime, I have found a solution in using the present form of Wine in Ubuntu with a one-window tiny freeware program, which works also for these large, complex PDF documents: BeCyPDFMetaEdit (available eg from freeware libraries like SoftPedia).

Aristo T.
  • 61
  • 1
  • 1
4

Another command is ebook-meta (avaiable after installing Calibre).

To see tags:

ebook-meta file.pdf

To change title:

ebook-meta file.pdf -t "Conversations with Ambrosius"
1

This is in the act library so you can edit PDF metadata from the command-line here as well.

$ npm install @lancejpollard/act -g
$ act update input.pdf --title foo --author bar --subject baz -k one -k two

You can also set -p publisher, -c creator, -t0 created date, and -tn updated date.