1 You can use [[pdftk]] to fill out [[PDF]] forms (thanks for the
2 inspiration, [Joe Rothweiler][JR]). The syntax is simple:
4 $ pdftk input.pdf fill_form data.fdf output output.pdf
6 where `input.pdf` is the input PDF containing the form, `data.fdf` is
7 an [FDF][] or [XFDF][] file containing your data, and `output.pdf` is
8 the name of the PDF you're creating. The tricky part is figuring out
9 what to put in `data.fdf`. There's a useful comparison of the Forms
10 Data Format (FDF) and it's XML version (XFDF) in the [XFDF
11 specification][XFDF-specs]. XFDF only covers a subset of FDF, so I
12 won't worry about it here. FDF is defined in section 12.7.7 of [ISO
13 32000-1:2008][ISO32000], the PDF 1.7 specification, and it has been in
14 PDF specifications since version 1.2.
16 Forms Data Format (FDF)
17 =======================
19 FDF files are basically stripped down PDFs (§12.7.7.1). A simple FDF
20 file will look something like:
23 1 0 obj<</FDF<</Fields[
24 <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
25 <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
33 Broken down into the lingo of ISO 32000, we have a header
38 followed by a body with a single object (§12.7.7.2.3):
40 1 0 obj<</FDF<</Fields[
41 <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
42 <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
47 followed by a trailer (§12.7.7.2.4):
53 Despite the claims in §12.7.7.2.1 that the trailer is optional, pdftk
54 choked on files without it:
58 1 0 obj<</FDF<</Fields[
59 <</T(Name)/V(Trevor)>>
60 <</T(Date)/V(2012-09-20)>>
63 $ pdftk input.pdf fill_form no-trailer.fdf output output.pdf
64 Error: Failed to open form data file:
68 Trailers are easy to add, since all they reqire is a reference to the
69 root of the FDF catalog dictionary. If you only have one dictionary,
70 you can always use the simple trailer I gave above.
75 The meat of the FDF file is the catalog (§12.7.7.3). Lets take a
76 closer look at the catalog structure:
78 1 0 obj<</FDF<</Fields[
82 This defines a new object (the FDF catalog) which contains one key
83 (the `/FDF` dictionary). The FDF dictionary contains one key
84 (`/Fields`) and its associated array of fields. Then we close the
85 `/Fields` array (`]`), close the FDF dictionary (`>>`) and close the
88 There are a number of interesting entries that you can add to the FDF
89 dictionary (§12.7.7.3.1, table 243), some of which require a more
90 advanced FDF version. You can use the `/Version` key to the FDF
91 catalog (§12.7.7.3.1, table 242) to specify the of data in the
94 1 0 obj<</Version/1.3/FDF<</Fields[…
96 Now you can extend the dictionary using table 244. Lets set things up
97 to use [UTF-8][] for the field values (`/V`) or options (`/Opt`):
99 1 0 obj<</Version/1.3/FDF<</Encoding/utf_8/Fields[
100 <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
101 <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
106 pdftk understands raw text in the specified encoding (`(…)`), raw
107 UTF-16 strings starting with a [BOM][] (`(\xFE\xFF…)`), or UTF-16BE
108 strings encoded as ASCII hex (`<FEFF…>`). You can use
109 [[pdf-merge.py|PDF_bookmarks_with_Ghostscript/pdf-merge.py]] and its
110 `--unicode` option to find the latter. Support for the `/utf_8`
111 encoding in pdftk is new. I mailed a
112 [[patch|0001-Add-support-for-Encoding-utf_8-to-the-FDF-reader.patch]]
113 to pdftk's Sid Steward and posted a [patch request][utf-8-patch] to
114 the underlying iText library. Until those get accepted, you're stuck
115 with the less convenient encodings.
120 Say you fill in some Unicode values, but your PDF reader is having
121 trouble rendering some funky glyphs. Maybe it doesn't have access to
122 the right font? You can see which fonts are embedded in a given PDF
126 name type emb sub uni object ID
127 ------------------------------------ ----------------- --- --- --- ---------
128 MMXQDQ+UniversalStd-NewswithCommPi CID Type 0C yes yes yes 1738 0
129 MMXQDQ+ZapfDingbatsStd CID Type 0C yes yes yes 1749 0
130 MMXQDQ+HelveticaNeueLTStd-Roman Type 1C yes yes no 1737 0
131 CPZITK+HelveticaNeueLTStd-BlkCn Type 1C yes yes no 1739 0
134 If you don't have the right font for your new data, you can add it
135 [using current versions of iText][TextFieldFonts.java]. However,
136 pdftk uses an older version, so I'm not sure how to translate this
139 FDF templates and field names
140 -----------------------------
142 You can use pdftk itself to create an FDF template, which it does with
143 embedded UTF-16BE (you can see the FE FF BOMS at the start of each
146 $ pdftk input.pdf generate_fdf output template.fdf
147 $ hexdump -C template.fdf | head
148 00000000 25 46 44 46 2d 31 2e 32 0a 25 e2 e3 cf d3 0a 31 |%FDF-1.2.%.....1|
149 00000010 20 30 20 6f 62 6a 20 0a 3c 3c 0a 2f 46 44 46 20 | 0 obj .<<./FDF |
150 00000020 0a 3c 3c 0a 2f 46 69 65 6c 64 73 20 5b 0a 3c 3c |.<<./Fields [.<<|
151 00000030 0a 2f 56 20 28 fe ff 29 0a 2f 54 20 28 fe ff 00 |./V (..)./T (...|
152 00000040 50 00 6f 00 73 00 74 00 65 00 72 00 4f 00 72 00 |P.o.s.t.e.r.O.r.|
155 You can also dump a more human friendly version of the PDF's fields
156 (without any default data):
158 $ pdftk input.pdf dump_data_fields_utf8 output data.txt
165 FieldJustification: Left
171 FieldJustification: Left
175 FieldNameAlt: Advisor:
177 FieldJustification: Left
181 If the fields are poorly named, you may have to fill the entire form
182 with unique values and then see which values appeared where in the
183 output PDF (for and example, see codehero's
184 [identify_pdf_fields.js][]).
189 This would be so much easier if people just used [YAML][] or [JSON][]
190 instead of bothering with PDFs ;).
193 [JR]: http://www.myown1.com/linux/pdf_formfill.shtml
194 [FDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#Forms_Data_Format_.28FDF.29
195 [XFDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#XML_Forms_Data_Format_.28XFDF.29
196 [XFDF-specs]: http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf
197 [ISO32000]: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
198 [UTF-8]: http://en.wikipedia.org/wiki/UTF-8
199 [BOM]: http://en.wikipedia.org/wiki/Byte_order_mark
200 [utf-8-patch]: https://sourceforge.net/p/itext/patches/101/
201 [pdffonts]: http://poppler.freedesktop.org/
202 [TextFieldFonts.java]: http://itextpdf.com/examples/iia.php?id=158
203 [identify_pdf_fields.js]: https://github.com/codehero/OpenTaxFormFiller/blob/master/script/identify_pdf_fields.js
204 [YAML]: http://www.yaml.org/
205 [JSON]: http://www.json.org/