Recipe: Converting PDF Documents Into PNG

In a little side-project involving DeepZooming documents I needed to convert PDF documents into images, and that with a controllable quality.

Of course, the program convert (part of the trusty ImageMagick library) can do that:

convert paper.pdf paper.png

It will not produce a single PNG, though, but one for each page, numbering them through. Other formats like TIFF can hold several images, so you would have to explicitely ask for multiple files with

convert -adjoin paper.pdf paper.tiff

If you need larger dimensions, say 1000 px width, then you can use scale:

convert -scale 1000 -adjoin paper.pdf paper.png

But the image is rather fuzzy, so you need to increase the precision, both in the PDF to scan, as well as when producing the PNG:

convert -scale 1000
        -adjoin
        -density 600x600
        -quality 90
        paper.pdf paper.png

That works, but only for rather small PDF documents. As soon as you have a whole paper, you will notice a substantial slowdown of your machine as convert is sucking up memory.

This is where professional software can distinguish itself, and ImageMagick does so: You can limit the amount of allocated memory (for the image data):

convert -scale 1000
        -adjoin
        -density 600x600
        -quality 90
        -limit memory 256mb -limit map 256mb
        paper.pdf paper.png

convert is now using the disk (TMPDIR, to be exact) in replacement of memory. Of course, that is much slower, but you cannot have everything.

It happened to me, that my /tmp/ partition is too small to hold everything. Luckily (ok, no luck but reason involved) there is another environment variable to move the cache somewhere else:

mkdir /home/rho/projects/deepzoom/xxx/
export MAGICK_TMPDIR=/home/rho/projects/deepzoom/xxx/
convert -limit memory 256mb -limit map 256mb .....

Very nice. Me likez.

Posted In