Tag Archives: pdf2png

Nicer pdf2png, with poppler

Updated: so poppler now includes pdftocairo which does this. No need to do this anymore! Blog post here for reference.

I’ve been using convert from ImageMagick to convert PDF-files to png files. However, they’re butt ugly, or rather fugly. So I created a pdf2png script / python program to do it better.

Just look at the text here from convert:

Rather ugly. Look at the kerning. It’s truly horrible.

Not to say poppler doesn’t have its share of problems, but it looks rather much better, don’t you agree?

So, since I had to manually edit a presentation I had to use some time making a PDF-to-PNG converter since I couldn’t find another pdf2png.

So without further ado, here is pdf2png.py (updated to use cairo as Raimund posted code for in comments, also updated with width+height extra params):

#!/usr/bin/env python
 
import poppler
import cairo
import gtk
import urllib
import sys, os
 
width = height = 0
 
if len(sys.argv) != 2 and len(sys.argv) != 4:
    print("Usage: %s <filename> [width height]")
    sys.exit()
 
if len(sys.argv) == 4:
    width = sys.argv[2]
    height = sys.argv[3]
 
input_filename = os.path.abspath(sys.argv[1])
output_filename = os.path.splitext(os.path.basename(sys.argv[1]))[0] + '-%.2d.png'
 
doc = poppler.document_new_from_file('file://%s' % \
            urllib.pathname2url(input_filename), password=None)
 
for i in xrange(doc.get_n_pages()):
    page = doc.get_page(i)
 
    if width and height:
        surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, int(width), int(height))
        ctx = cairo.Context(surface)
    else:
        surface = cairo.ImageSurface(cairo.FORMAT_ARGB32,
                int(page.get_size()[0] * 2), int(page.get_size()[1] * 2))
        ctx = cairo.Context(surface)
        ctx.scale(2, 2)
 
    page.render(ctx)
    ctx.set_operator(cairo.OPERATOR_DEST_OVER)
    ctx.set_source_rgb(1, 1, 1)
    ctx.paint()
    surface.write_to_png(output_filename % i)

It’s very far from perfect. Note the hard coded height and width, all of these things are possible fixes. Not anymore! The default should be sensible now, or you can force it with arguments. I didn’t find any python-poppler documentation, but I used C++-docs instead, they were helpful enough.

If you do any improvements (thanks Raimund) or just use it, it’d make me happy if you told me in a comment. :-)