Nicer pdf2png, with poppler

Updated: so poppler now includes pdftocairo which does this. No need to do this anymore! Blog post here for reference.

I’ve been using convert from ImageMagick to convert PDF-files to png files. However, they’re butt ugly, or rather fugly. So I created a pdf2png script / python program to do it better.

Just look at the text here from convert:

Rather ugly. Look at the kerning. It’s truly horrible.

Not to say poppler doesn’t have its share of problems, but it looks rather much better, don’t you agree?

So, since I had to manually edit a presentation I had to use some time making a PDF-to-PNG converter since I couldn’t find another pdf2png.

So without further ado, here is pdf2png.py (updated to use cairo as Raimund posted code for in comments, also updated with width+height extra params):

#!/usr/bin/env python
 
import poppler
import cairo
import gtk
import urllib
import sys, os
 
width = height = 0
 
if len(sys.argv) != 2 and len(sys.argv) != 4:
    print("Usage: %s <filename> [width height]")
    sys.exit()
 
if len(sys.argv) == 4:
    width = sys.argv[2]
    height = sys.argv[3]
 
input_filename = os.path.abspath(sys.argv[1])
output_filename = os.path.splitext(os.path.basename(sys.argv[1]))[0] + '-%.2d.png'
 
doc = poppler.document_new_from_file('file://%s' % \
            urllib.pathname2url(input_filename), password=None)
 
for i in xrange(doc.get_n_pages()):
    page = doc.get_page(i)
 
    if width and height:
        surface = cairo.ImageSurface(cairo.FORMAT_ARGB32, int(width), int(height))
        ctx = cairo.Context(surface)
    else:
        surface = cairo.ImageSurface(cairo.FORMAT_ARGB32,
                int(page.get_size()[0] * 2), int(page.get_size()[1] * 2))
        ctx = cairo.Context(surface)
        ctx.scale(2, 2)
 
    page.render(ctx)
    ctx.set_operator(cairo.OPERATOR_DEST_OVER)
    ctx.set_source_rgb(1, 1, 1)
    ctx.paint()
    surface.write_to_png(output_filename % i)

It’s very far from perfect. Note the hard coded height and width, all of these things are possible fixes. Not anymore! The default should be sensible now, or you can force it with arguments. I didn’t find any python-poppler documentation, but I used C++-docs instead, they were helpful enough.

If you do any improvements (thanks Raimund) or just use it, it’d make me happy if you told me in a comment. :-)

6 thoughts on “Nicer pdf2png, with poppler”

  1. Bruno: Thank you, noone had responded on this one, so I guess it was impossible to find on the intarwebs. I couldn’t find it myself when I was searching around – so maybe the same happened here.

    Comments like these always makes me want to write about any possible solutions I find. :-)

  2. Thanks for this — I had used convert from ImageMagick, but as you note, the results were less than satisfactory. And, it’s nice to see it in python!

  3. Seems like there is no ‘render_to_pixbuf’ anymore since poppler>=0.17.

    It is said this should be done via cairo instead:

    Here is how I did it, using scale of 4.591 which in my case approximates a resolution of about 300dpi.

    import cairo

    now the loop-part, instead of the code above:

    for i in xrange(doc.get_n_pages()):
        page = doc.get_page(i)
        surface=cairo.ImageSurface(cairo.FORMAT_ARGB32, int(page.get_size()[0]4.591), int(page.get_size()[1]4.591))
        ctx=cairo.Context(surface)
        ctx.scale(4.591, 4.591)
        page.render(ctx)
        surface.write_to_png(output_filename % i)

    I hope someone might find it useful :-)

  4. Thanks a lot Raimund! I updated the script and the post.

    It’s much better (well, being broken on modern distros is not very helpful!) now.

    It was actually hardcoded values because that’s what I needed then and there, although the flexible way you’re doing is much better. So I copied that and the script will do both forms now.

    I actually have this as a pdf2png in my .local/bin/ on my computer, and I wondered why it didn’t work a few weeks ago when I tried using it. Was in a real hurry so couldn’t look at it.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.