5

I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.

Is there a way to do any one of the following:

  1. Merge the PDFs without duplicating fonts?
  2. De-duplicate the fonts in the PDF later?
  3. Remove fonts from the PDF entirely?

The ideal solution will have a commercial friendly open source license (eg. not APGL).

Kurt Pfeifle
  • 4,785

1 Answers1

2

Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.

Inputs

Here are the details about 3 input PDFs, which I'll merge into a single output:

for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

Merging

Now merge these three PDF input files with the help of pdftk.

pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf

Output

Now check the font status of the output merged.pdf:

pdffonts merged.pdf

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       5  0
 Helvetica                  Type 1C           WinAnsi          yes no  no      14  0
 Helvetica                  Type 1C           WinAnsi          yes no  no      23  0

Ok, not yet there...

Optimize with Ghostscript

gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf 

 GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
 Copyright (C) 2018 Artifex Software, Inc.  All rights reserved.
 This software comes with NO WARRANTY: see the file PUBLIC for details.
 Processing pages 1 through 3.
 Page 1
 Page 2
 Page 3

Check font statuses and file sizes

ls -lh {1..3}.pdf merged.pdf optim.pdf 

 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 1.pdf
 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 2.pdf
 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 3.pdf
 -rw-r--r--  1 kurtpfeifle  staff   147K Dec 31 20:32 merged.pdf
 -rw-r--r--  1 kurtpfeifle  staff   7.5K Dec 31 20:34 optim.pdf

Conclusion

I tested this with Ghostscript v9.25.

If this doesn't work for you, you'll need to...

  1. ...tell us the version of Ghostscript you are using;
  2. ...provide a link to (some of) your input PDFs for more detailed analysis.

I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...

Kurt Pfeifle
  • 4,785