You can use [[pdftk]] to fill out [[PDF]] forms (thanks for the inspiration, [Joe Rothweiler][JR]). The syntax is simple: $ pdftk input.pdf fill_form data.fdf output output.pdf where `input.pdf` is the input PDF containing the form, `data.fdf` is an [FDF][] or [XFDF][] file containing your data, and `output.pdf` is the name of the PDF you're creating. The tricky part is figuring out what to put in `data.fdf`. There's a useful comparison of the Forms Data Format (FDF) and it's XML version (XFDF) in the [XFDF specification][XFDF-specs]. XFDF only covers a subset of FDF, so I won't worry about it here. FDF is defined in section 12.7.7 of [ISO 32000-1:2008][ISO32000], the PDF 1.7 specification, and it has been in PDF specifications since version 1.2. Forms Data Format (FDF) ======================= FDF files are basically stripped down PDFs (§12.7.7.1). A simple FDF file will look something like: %FDF-1.2 1 0 obj<> <> … ] >> >> endobj trailer <> %%EOF Broken down into the lingo of ISO 32000, we have a header (§12.7.7.2.2): %FDF-1.2 followed by a body with a single object (§12.7.7.2.3): 1 0 obj<> <> … ] >> >> endobj followed by a trailer (§12.7.7.2.4): trailer <> %%EOF Despite the claims in §12.7.7.2.1 that the trailer is optional, pdftk choked on files without it: $ cat no-trailer.fdf %FDF-1.2 1 0 obj<> <> ] >> >> endobj $ pdftk input.pdf fill_form no-trailer.fdf output output.pdf Error: Failed to open form data file: data.fdf No output created. Trailers are easy to add, since all they reqire is a reference to the root of the FDF catalog dictionary. If you only have one dictionary, you can always use the simple trailer I gave above. FDF Catalog ----------- The meat of the FDF file is the catalog (§12.7.7.3). Lets take a closer look at the catalog structure: 1 0 obj<> >> This defines a new object (the FDF catalog) which contains one key (the `/FDF` dictionary). The FDF dictionary contains one key (`/Fields`) and its associated array of fields. Then we close the `/Fields` array (`]`), close the FDF dictionary (`>>`) and close the FDF catalog (`>>`). There are a number of interesting entries that you can add to the FDF dictionary (§12.7.7.3.1, table 243), some of which require a more advanced FDF version. You can use the `/Version` key to the FDF catalog (§12.7.7.3.1, table 242) to specify the of data in the dictionary: 1 0 obj<> <> … ] >> >> endobj pdftk understands raw text in the specified encoding (`(…)`), raw UTF-16 strings starting with a [BOM][] (`(\xFE\xFF…)`), or UTF-16BE strings encoded as ASCII hex (``). You can use [[pdf-merge.py|PDF_bookmarks_with_Ghostscript/pdf-merge.py]] and its `--unicode` option to find the latter. Support for the `/utf_8` encoding in pdftk is new. I mailed a [[patch|0001-Add-support-for-Encoding-utf_8-to-the-FDF-reader.patch]] to pdftk's Sid Steward and posted a [patch request][utf-8-patch] to the underlying iText library. Until those get accepted, you're stuck with the less convenient encodings. Fonts ----- Say you fill in some Unicode values, but your PDF reader is having trouble rendering some funky glyphs. Maybe it doesn't have access to the right font? You can see which fonts are embedded in a given PDF using [pdffonts][]. $ pdffonts input.pdf name type emb sub uni object ID ------------------------------------ ----------------- --- --- --- --------- MMXQDQ+UniversalStd-NewswithCommPi CID Type 0C yes yes yes 1738 0 MMXQDQ+ZapfDingbatsStd CID Type 0C yes yes yes 1749 0 MMXQDQ+HelveticaNeueLTStd-Roman Type 1C yes yes no 1737 0 CPZITK+HelveticaNeueLTStd-BlkCn Type 1C yes yes no 1739 0 … If you don't have the right font for your new data, you can add it [using current versions of iText][TextFieldFonts.java]. However, pdftk uses an older version, so I'm not sure how to translate this idea for pdftk. FDF templates and field names ----------------------------- You can use pdftk itself to create an FDF template, which it does with embedded UTF-16BE (you can see the FE FF BOMS at the start of each string value). $ pdftk input.pdf generate_fdf output template.fdf $ hexdump -C template.fdf | head 00000000 25 46 44 46 2d 31 2e 32 0a 25 e2 e3 cf d3 0a 31 |%FDF-1.2.%.....1| 00000010 20 30 20 6f 62 6a 20 0a 3c 3c 0a 2f 46 44 46 20 | 0 obj .<<./FDF | 00000020 0a 3c 3c 0a 2f 46 69 65 6c 64 73 20 5b 0a 3c 3c |.<<./Fields [.<<| 00000030 0a 2f 56 20 28 fe ff 29 0a 2f 54 20 28 fe ff 00 |./V (..)./T (...| 00000040 50 00 6f 00 73 00 74 00 65 00 72 00 4f 00 72 00 |P.o.s.t.e.r.O.r.| … You can also dump a more human friendly version of the PDF's fields (without any default data): $ pdftk input.pdf dump_data_fields_utf8 output data.txt $ cat data.txt --- FieldType: Text FieldName: Name FieldNameAlt: Name: FieldFlags: 0 FieldJustification: Left --- FieldType: Text FieldName: Date FieldNameAlt: Date: FieldFlags: 0 FieldJustification: Left --- FieldType: Text FieldName: Advisor FieldNameAlt: Advisor: FieldFlags: 0 FieldJustification: Left --- … If the fields are poorly named, you may have to fill the entire form with unique values and then see which values appeared where in the output PDF (for and example, see codehero's [identify_pdf_fields.js][]). Conclusions =========== This would be so much easier if people just used [YAML][] or [JSON][] instead of bothering with PDFs ;). [JR]: http://www.myown1.com/linux/pdf_formfill.shtml [FDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#Forms_Data_Format_.28FDF.29 [XFDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#XML_Forms_Data_Format_.28XFDF.29 [XFDF-specs]: http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf [ISO32000]: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf [UTF-8]: http://en.wikipedia.org/wiki/UTF-8 [BOM]: http://en.wikipedia.org/wiki/Byte_order_mark [utf-8-patch]: https://sourceforge.net/p/itext/patches/101/ [pdffonts]: http://poppler.freedesktop.org/ [TextFieldFonts.java]: http://itextpdf.com/examples/iia.php?id=158 [identify_pdf_fields.js]: https://github.com/codehero/OpenTaxFormFiller/blob/master/script/identify_pdf_fields.js [YAML]: http://www.yaml.org/ [JSON]: http://www.json.org/ [[!tag tags/tools]]