Update: go directly to solution
I’ve lost many hours of high quality chair sleep over a certain PDF-problem that I’ve been tasked to work on these last months. We’re actually trying to do something extremely easy, this list summarizes it:
- mail merge a PDF
And that’s about it. Oh, there’s also this tiny would-be-nice requirement; it has to work.
There are proprietary solutions, they are surprisingly extremely expensive. This is something we’d do often so can’t pay us out of it. If something had a one-time cost we could easily do it. But alas.
I got a ready PDF with picture of a boat and some text from our graphics dude, and a big list of names. I was tasked to merge the boat and the list. So armed with my CSV-file and the PDF I started thinking how this could be done…
And there goes months of my life
I tried many things. First using my youth searching the intertubes for clues on how to best proceed. I was bogged down by a number of commercial tools, and the Java library iText. I found FPDF for PHP creation of PDF’s and a whole lot more. No half-easy method of doing this short of writing my own program.
I wasn’t interested in that.
I then tried the new OpenOffice 3.0 and the PDF-plugin. Put the PDF as a background and did mail merge on top of that. No such luck. The new 64bit ubuntu computer with a dogpile of RAM just died.
And so I gave up. I printed 6.000, I then lined up the fields from my CSV and did a mail merge in OpenOffice, so I put the 6.000 printed pages and put them into the printer again. Of course, we now paid for 12.000 pages, and it took a long time.
And there we go again, 30.000 this time
Just when I had finally forgotten the whole thing. … Let’s do it again! Only this time, no cheating.
I found pdftk (why didn’t I last time!?). I think I found a way. So I started making fields on the PDF’s. Then using FDF’s to fill the fields automatically. Formfiller.
So, just it’s just to write a script to make 30.000 PDF-files!
PROBLEM: Each file is easily 2 MB. That’ll be 60.000 MB. No way. It’ll take forever. The printer won’t like it, the computer won’t like it, the network won’t like it. They’ll all hate it.
So, solution; pdftk + background
I had only been reading “relevant” parts of the pdftk manual. I only saw what I wanted to see, and that was not background.
The merging I’ve been looking for was right there. And so the years I spent on this problem were finished! Here’s the procedure:
- Use OpenOffice to mail merge all the names just like blank pages.
- Export as PDF.
You’ll get a big, blank document with names on each page.
- Open a terminal, and add the boat as a background to your 30.000 page PDF:
pdftk names.pdf background boat_background.pdf output out.pdf
And there you go. The sweet deal about this, is that the background is only saved once, and referenced on all the other pages. Nice, just like I wanted.
Update: I trimmed this blog post a bit. Also, in the end I wrote a python-program that creates the PDF-to-be-merged with reportlab and does the nice pdftk-trick in the end. It went from being a full-day job to taking about 2 minutes to do a 30.000 page merge.