PDF issues

One could argue that Templater's weak point is PDF output. Because you can't natively convert to PDF without having full blown display engine with its own set of features, we don't even consider adding PDF conversion as its core feature.

Naturally, our users often require PDF output, so we need to convert populated templates to PDF somehow.

This is a story on how we finally settled on method for PDF conversion.

  • Word automation - our app was running on Windows, so our first choice was to use Word API since there was Word installed and Word is a reference on how Word document should look, even in PDF. Being aware of Office Automation problems
    we were prepared to try next method soon, but still wanted to check it out for ourselves. We ended up with conclusion that you can't create instance of word running in IIS service as non-privileged user. Moving along.
  • PDF printers - PDF printers are useless without application that knows how to process Word document. This means, that without Word or some other office suite that groks docx, they can't do anything useful with document. Nothing to see here.
  • Third party libraries - there are free, reasonably priced and expensive ones which offer PDF conversion. Every one we tried (even the expensive ones) failed to convert docx to PDF without major differences for non-trivial documents. Most can convert basic styles, some are even pretty good, but our users were annoyed that PDF "we" produced didn't look as they prepared it. Onto the next one.
  • OpenOffice.org - having failed to convert to "pixel perfect" PDF even with third party libraries we turned to OpenOffice/LibreOffice path. Our trust in OpenOffice more that in expensive third party libraries were well justified. It did a pretty good job of converting. Unfortunately, templates soon became even more complicated and we had to look for another solution
  • LibreOffice - Fortunately, LibreOffice was in active development while OpenOffice stagnated, so when we tried our most complicated document it was converted as expected. But LibreOffice was only part of the story.

We built a small .NET wrapper around UNO component for conversion and used that. But we wanted to use Linux for PDF conversion since most of our systems are running on Linux. While we could write those few lines of code around UNO ourselves, we’ve used JodConverter which already does that. But our troubles were only starting...

JodConverter 2 doesn’t have automated method of starting LibreOffice in headless mode. So we upgraded to JodConverter 3. Interestingly, JodConverter has downgraded its conversion API from byte[] to File. While annoying that’s something we can live with currently.

What was causing us headaches was that LibreOffice didn’t create PDF as expected when run in headless mode. So we investigated a little and concluded that GUI loaded some LibreOffice fonts which were not available in headless mode. Quick installation of mscorefonts fixed the problem.

In the end, PDF conversion via command line is as simple as running:
soffice -norestore -nofirststartwizard -nologo -headless -convert-to pdf input-document.docx

TLDR;

  • Users prepare templates in Word.
  • Templater populates docx with data.
  • LibreOffice in headless mode is used to convert docx to PDF


Back to Documents

t