Mail Merge database + PDF for free in Linux (w/o big overhead)

Update: go directly to solution

I’ve lost many hours of high quality chair sleep over a certain PDF-problem that I’ve been tasked to work on these last months. We’re actually trying to do something extremely easy, this list summarizes it:

  • mail merge a PDF

And that’s about it. Oh, there’s also this tiny would-be-nice requirement; it has to work.

There are proprietary solutions, they are surprisingly extremely expensive. This is something we’d do often so can’t pay us out of it. If something had a one-time cost we could easily do it. But alas.

I got a ready PDF with picture of a boat and some text from our graphics dude, and a big list of names. I was tasked to merge the boat and the list. So armed with my CSV-file and the PDF I started thinking how this could be done…

And there goes months of my life

I tried many things. First using my youth searching the intertubes for clues on how to best proceed. I was bogged down by a number of commercial tools, and the Java library iText. I found FPDF for PHP creation of PDF’s and a whole lot more. No half-easy method of doing this short of writing my own program.

I wasn’t interested in that.

I then tried the new OpenOffice 3.0 and the PDF-plugin. Put the PDF as a background and did mail merge on top of that. No such luck. The new 64bit ubuntu computer with a dogpile of RAM just died.

And so I gave up. I printed 6.000, I then lined up the fields from my CSV and did a mail merge in OpenOffice, so I put the 6.000 printed pages and put them into the printer again. Of course, we now paid for 12.000 pages, and it took a long time.

And there we go again, 30.000 this time

Just when I had finally forgotten the whole thing. … Let’s do it again! Only this time, no cheating.

I found pdftk (why didn’t I last time!?). I think I found a way. So I started making fields on the PDF’s. Then using FDF’s to fill the fields automatically. Formfiller.

So, just it’s just to write a script to make 30.000 PDF-files!

PROBLEM: Each file is easily 2 MB. That’ll be 60.000 MB. No way. It’ll take forever. The printer won’t like it, the computer won’t like it, the network won’t like it. They’ll all hate it.

So, solution; pdftk + background

I had only been reading “relevant” parts of the pdftk manual. I only saw what I wanted to see, and that was not background.

The merging I’ve been looking for was right there. And so the years I spent on this problem were finished! Here’s the procedure:

  1. Use OpenOffice to mail merge all the names just like blank pages.
  2. Export as PDF.
    You’ll get a big, blank document with names on each page.
  3. Open a terminal, and add the boat as a background to your 30.000 page PDF:
    pdftk names.pdf background boat_background.pdf output out.pdf

And there you go. The sweet deal about this, is that the background is only saved once, and referenced on all the other pages. Nice, just like I wanted.

Update: I trimmed this blog post a bit. Also, in the end I wrote a python-program that creates the PDF-to-be-merged with reportlab and does the nice pdftk-trick in the end. It went from being a full-day job to taking about 2 minutes to do a 30.000 page merge.

7 thoughts on “Mail Merge database + PDF for free in Linux (w/o big overhead)”

  1. Hi,

    Thanks for the writeup.

    I’ve been trying (on and off) to do the exact same thing for about a year and have tried many ideas but memory usage seems to be the killer – I’ll try the background option and see how I go.

  2. Just finished using pdftk for the first time thanks to your entry here. Thank you for doing the bulk of the work. I can’t tell you how much I appreciate your posting. I am an Ubuntu fan and working to deploy it in the enterprise I work in but this morning I got your solution to work on a Windows platform. Super results! Thank you!

  3. Thank you so so much for your instructions. I’ve downloaded pdftk for windows and it saved my day. I was going to buy a software for $150 for our company.

  4. Ray, Mike Kalua, Happy Man: Thank you all for the good comments! I really love getting these infrequent emails of new “thank you” comments :D

    Update: Reading through the post now, I see that it’s in fact kinda old. I also got tired of OpenOffice being so unbearably slow when making/saving the 30.000+ page PDF, that I in the end succumbed to writing my own small Python program to take a CSV-file and make it into a PDF.

    Time spent went from 4+ hours using OpenOffice (now LibreOffice) to under 2 seconds doing the same thing. And OOo was also kinda crashprone when dealing with that much data IIRC, not so with the python script.

  5. Thanks for the post, you saved me having to work all that out by myself. Now if you’re looking to write something else for your blog, may I suggest an article covering python, csv files and generating PDFs ;-)

Leave a Reply to Mike Kalua Cancel reply

Your email address will not be published.