64

I have a PDF file that was the result of the scan of a book.

In this file 2 pages of the book correspond to 1 in the PDF. So when I see a page in the PDF file I'm actually seeing 2 pages of the book.

enter image description here

(original)

I would like to know if there's any way to convert this file to another PDF where 1 page of the book corresponds to 1 page of the PDF i.e. the normal situation.

fossfreedom
  • 174,526
JGNog
  • 829

9 Answers9

73

You can use mutool, a MuPDF command-line tool (sudo apt-get install mupdf-tools):

mutool poster -x 2 input.pdf output.pdf

You can also use -y if you want to perform a vertical split.

Peque
  • 1,258
30

Try Gscan2pdf, which you can download from the Software Centre or which you can install from command line sudo apt-get install gscan2pdf.

Open Gscan2Pdf:

  1. file > import your PDF file;

    import

    Now you have a single page (see the left column):

    single

  2. then tools > Clean up;

    clean up

  3. select double as layout and #output pages as 2, then click OK;

    split

  4. Gscan2pdf splits your document (among other things, it will also clean it up and deskew it etc.) Now you have two pages:

    double

  5. Save your PDF file if you're satisfied with the result.
neydroydrec
  • 4,780
17

I would use Briss. It lets you select various regions of each page, each of which to turn into a new page.

enter image description here

frabjous
  • 6,601
4

Another option is ScanTailor. This program is particularly well suited to processing several scans at a time.

apt-get install scantailor

It unfortunately only works on image file inputs, but it's simple enough to convert a scanned PDF to a jpg. Here's a one-liner that I used for converting a whole directory of PDFs into jpgs. If a PDF has n pages, it makes n jpg files.

for f in ./*.pdf; do gs -q -dSAFER -dBATCH -dNOPAUSE -r300 -dGraphicsAlphaBits=4 -dTextAlphaBits=4 -sDEVICE=png16m "-sOutputFile=$f%02d.png" "$f" -c quit; done;

I had screenshots ready to share, but I don't have enough rep to post them.

ScanTailor outputs to tif, so if you want the files back in PDF you can use this to make a PDF for each page.

for f in ./*.tif; do tiff2pdf "$f" -o "$f".pdf -p letter -F; done;

Then you can use this one-liner, or an application like PDFShuffler to merge any or all files into one PDF.

gs -q -sPAPERSIZE=letter -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=output.pdf *.pdf

Curtis
  • 141
2

A command line solution using ImageMagick:

  1. Split the PDF into individual images, here at 300 dpi resolution:

     convert -density 300 orig.pdf page.png
    
  2. Split each page image into a left and right image:

     for file in page-*.png;
       do convert "$file" -crop 50%x100% "$file-split.png";
     done
    
  3. Rename the page-###-split-#.png files to just 001.png, 002.png etc.:

     ls page-*-split-*.png | cat -n | 
       while read n f; do mv "$f" $(printf "%03d.png" $n); done
    
  4. Combine the resulting page images into a PDF again:

     convert [0-9][0-9][0-9].png result.pdf
    

Sources, variations and further tips:

tanius
  • 6,610
  • 2
  • 42
  • 52
1

Here is a python script for this:

# Source http://stackoverflow.com/a/15741856/1301753

import copy import sys import math import pyPdf

def split_pages(src, dst): src_f = file(src, 'r+b') dst_f = file(dst, 'w+b')

input = pyPdf.PdfFileReader(src_f)
output = pyPdf.PdfFileWriter()

for i in range(input.getNumPages()):
    p = input.getPage(i)
    q = copy.copy(p)
    q.mediaBox = copy.copy(p.mediaBox)

    x1, x2 = p.mediaBox.lowerLeft
    x3, x4 = p.mediaBox.upperRight

    x1, x2 = math.floor(x1), math.floor(x2)
    x3, x4 = math.floor(x3), math.floor(x4)
    x5, x6 = math.floor(x3/2), math.floor(x4/2)

    if x3 > x4:
        # horizontal
        p.mediaBox.upperRight = (x5, x4)
        p.mediaBox.lowerLeft = (x1, x2)

        q.mediaBox.upperRight = (x3, x4)
        q.mediaBox.lowerLeft = (x5, x2)
    else:
        # vertical
        p.mediaBox.upperRight = (x3, x4)
        p.mediaBox.lowerLeft = (x1, x6)

        q.mediaBox.upperRight = (x3, x6)
        q.mediaBox.lowerLeft = (x1, x2)

    output.addPage(p)
    output.addPage(q)

output.write(dst_f)
src_f.close()
dst_f.close()

input_file=raw_input("Enter the original PDF file name :") output_file=raw_input("Enter the splitted PDF file name :")

split_pages(input_file,output_file)

I hold a copy of this on my personal github site...

andrew.46
  • 39,359
0

Sejda can do that either using its web interface or command line interface (open source). The task is called splitdownthemiddle

-1

You could use okular or any pdf reader and then use print to file and select options and copies-> pages . Select your interested pages and then give print. It will cut the selected pages . Simple and easy !!

-2

There is a wonderful program scankromsator. It is free and works quite well through wine. More information here.

oromay
  • 1