6

I have recently scanned a book into a 600 page PDF file. However the pages are randomly skewed/rotated clockwise or counterclockwise. Any software to automatically correct this ? I know Acrobat Pro can, but any free Ubuntu software / script ?

Hee Jin
  • 886

3 Answers3

5

Deskew

Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.

Installation: Download last release. It's written in Pascal, but seems well maintained.

pagetools: Page Layout Detection Tools

Automatic deskew and bounding box determination for scanned page images

sudo apt install pagetools

Last Update: 2013-03-22

Pablo Bianchi
  • 17,371
2

This is almost automatic, starting with a multipage .pdf:

  • Install scantailor-advanced

    • Open Gnome Software (install it if absent) / [This does not work on App Center/Snap store]
    • Search for scantailor, select the one with Ubuntu as source (snap) (avoid flathub)
  • Split the pdf into .png files

gs -dBATCH -dNOPAUSE -sDEVICE=pnggray -r300 -dUseCropBox -sOutputFile=filename-%03d.png multipage.pdf
  • Launch scantailor-advanced

    • select for "New Project" the folder with the .png files
    • In the left menu go carefully through each option, one by one, and define the settings, pressing the title, and then, pressing the play icon
    • Use "apply to/Change" "All pages" specially in the last option "Output"
  • Go to the output folder with the .tif files

  • Combine them with

    convert *.tif  Desired_Name.pdf
    
  • If that command fails, because of having more than 50 pages, use something like this: https://pastebin.com/pTsggARx

Ferroao
  • 959
1

Do you mean skewed—as in, stretched in some way, like this:

parallelogram

—or rotated?

I'm assuming you mean rotated, since I honestly don't think it's possible for your scanner to mess the image up that badly!

If you just need to rotate, I would recommend PDF-Shuffler, a GUI-based program that can make the process of going through each page and rotating them as necessary a lot less painful. Have a look. And I'm sure there are other programs that could do the same thing.

Unfortunately, I don't know of any software that can look over all the pages in your PDF and decide for you which ones need to be transformed in some complex way, let alone rotated.

EDIT: If your file was a native pdf that could be converted into postscript (.ps) format, I think it's possible there is a way to autorotate pages using ghostscript. However, to my knowledge, you can't do this with scanned pages, because the auto-rotate feature relies on interpretation of text direction, which can only come from a native pdf or ps document. I'm not completely sure...I will look into this a little more.

Hee Jin
  • 886