--- /dev/null
+You can use [[pdftk]] to fill out [[PDF]] forms (thanks for the
+inspiration, [Joe Rothweiler][JR]). The syntax is simple:
+
+ $ pdftk input.pdf fill_form data.fdf output output.pdf
+
+where `input.pdf` is the input PDF containing the form, `data.fdf` is
+an [FDF][] or [XFDF][] file containing your data, and `output.pdf` is
+the name of the PDF you're creating. The tricky part is figuring out
+what to put in `data.fdf`. There's a useful comparison of the Forms
+Data Format (FDF) and it's XML version (XFDF) in the [XFDF
+specification][XFDF-specs]. XFDF only covers a subset of FDF, so I
+won't worry about it here. FDF is defined in section 12.7.7 of [ISO
+32000-1:2008][ISO32000], the PDF 1.7 specification, and it has been in
+PDF specifications since version 1.2.
+
+Forms Data Format (FDF)
+=======================
+
+FDF files are basically stripped down PDFs (§12.7.7.1). A simple FDF
+file will look something like:
+
+ %FDF-1.2
+ 1 0 obj<</FDF<</Fields[
+ <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
+ <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
+ …
+ ] >> >>
+ endobj
+ trailer
+ <</Root 1 0 R>>
+ %%EOF
+
+Broken down into the lingo of ISO 32000, we have a header
+(§12.7.7.2.2):
+
+ %FDF-1.2
+
+followed by a body with a single object (§12.7.7.2.3):
+
+ 1 0 obj<</FDF<</Fields[
+ <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
+ <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
+ …
+ ] >> >>
+ endobj
+
+followed by a trailer (§12.7.7.2.4):
+
+ trailer
+ <</Root 1 0 R>>
+ %%EOF
+
+Despite the claims in §12.7.7.2.1 that the trailer is optional, pdftk
+choked on files without it:
+
+ $ cat no-trailer.fdf
+ %FDF-1.2
+ 1 0 obj<</FDF<</Fields[
+ <</T(Name)/V(Trevor)>>
+ <</T(Date)/V(2012-09-20)>>
+ ] >> >>
+ endobj
+ $ pdftk input.pdf fill_form no-trailer.fdf output output.pdf
+ Error: Failed to open form data file:
+ data.fdf
+ No output created.
+
+Trailers are easy to add, since all they reqire is a reference to the
+root of the FDF catalog dictionary. If you only have one dictionary,
+you can always use the simple trailer I gave above.
+
+FDF Catalog
+-----------
+
+The meat of the FDF file is the catalog (§12.7.7.3). Lets take a
+closer look at the catalog structure:
+
+ 1 0 obj<</FDF<</Fields[
+ …
+ ] >> >>
+
+This defines a new object (the FDF catalog) which contains one key
+(the `/FDF` dictionary). The FDF dictionary contains one key
+(`/Fields`) and its associated array of fields. Then we close the
+`/Fields` array (`]`), close the FDF dictionary (`>>`) and close the
+FDF catalog (`>>`).
+
+There are a number of interesting entries that you can add to the FDF
+dictionary (§12.7.7.3.1, table 243), some of which require a more
+advanced FDF version. You can use the `/Version` key to the FDF
+catalog (§12.7.7.3.1, table 242) to specify the of data in the
+dictionary:
+
+ 1 0 obj<</Version/1.3/FDF<</Fields[…
+
+Now you can extend the dictionary using table 244. Lets set things up
+to use [UTF-8][] for the field values (`/V`) or options (`/Opt`):
+
+ 1 0 obj<</Version/1.3/FDF<</Encoding/utf_8/Fields[
+ <</T(FIELD1_NAME)/V(FIELD1_VALUE)>>
+ <</T(FIELD2_NAME)/V(FIELD2_VALUE)>>
+ …
+ ] >> >>
+ endobj
+
+pdftk understands raw text in the specified encoding (`(…)`), raw
+UTF-16 strings starting with a [BOM][] (`(\xFE\xFF…)`), or UTF-16BE
+strings encoded as ASCII hex (`<FEFF…>`). You can use
+[[pdf-merge.py|PDF_bookmarks_with_Ghostscript/pdf-merge.py]] and its
+`--unicode` option to find the latter. Support for the `/utf_8`
+encoding in pdftk is new. I mailed a
+[[patch|0001-Add-support-for-Encoding-utf_8-to-the-FDF-reader.patch]]
+to pdftk's Sid Steward and posted a [patch request][utf-8-patch] to
+the underlying iText library. Until those get accepted, you're stuck
+with the less convenient encodings.
+
+Fonts
+-----
+
+Say you fill in some Unicode values, but your PDF reader is having
+trouble rendering some funky glyphs. Maybe it doesn't have access to
+the right font? You can see which fonts are embedded in a given PDF
+using [pdffonts][].
+
+ $ pdffonts input.pdf
+ name type emb sub uni object ID
+ ------------------------------------ ----------------- --- --- --- ---------
+ MMXQDQ+UniversalStd-NewswithCommPi CID Type 0C yes yes yes 1738 0
+ MMXQDQ+ZapfDingbatsStd CID Type 0C yes yes yes 1749 0
+ MMXQDQ+HelveticaNeueLTStd-Roman Type 1C yes yes no 1737 0
+ CPZITK+HelveticaNeueLTStd-BlkCn Type 1C yes yes no 1739 0
+ …
+
+If you don't have the right font for your new data, you should
+complain to whoever generated the PDF that you're trying to fill out,
+because I can't figure out how to attach a new font to an
+already-generated PDF for use with your new data.
+
+FDF templates and field names
+-----------------------------
+
+You can use pdftk itself to create an FDF template, which it does with
+embedded UTF-16BE (you can see the FE FF BOMS at the start of each
+string value).
+
+ $ pdftk input.pdf generate_fdf output template.fdf
+ $ hexdump -C template.fdf | head
+ 00000000 25 46 44 46 2d 31 2e 32 0a 25 e2 e3 cf d3 0a 31 |%FDF-1.2.%.....1|
+ 00000010 20 30 20 6f 62 6a 20 0a 3c 3c 0a 2f 46 44 46 20 | 0 obj .<<./FDF |
+ 00000020 0a 3c 3c 0a 2f 46 69 65 6c 64 73 20 5b 0a 3c 3c |.<<./Fields [.<<|
+ 00000030 0a 2f 56 20 28 fe ff 29 0a 2f 54 20 28 fe ff 00 |./V (..)./T (...|
+ 00000040 50 00 6f 00 73 00 74 00 65 00 72 00 4f 00 72 00 |P.o.s.t.e.r.O.r.|
+ …
+
+You can also dump a more human friendly version of the PDF's fields
+(without any default data):
+
+ $ pdftk input.pdf dump_data_fields_utf8 output data.txt
+ $ cat data.txt
+ ---
+ FieldType: Text
+ FieldName: Name
+ FieldNameAlt: Name:
+ FieldFlags: 0
+ FieldJustification: Left
+ ---
+ FieldType: Text
+ FieldName: Date
+ FieldNameAlt: Date:
+ FieldFlags: 0
+ FieldJustification: Left
+ ---
+ FieldType: Text
+ FieldName: Advisor
+ FieldNameAlt: Advisor:
+ FieldFlags: 0
+ FieldJustification: Left
+ ---
+ …
+
+If the fields are poorly named, you may have to fill the entire form
+with unique values and then see which values appeared where in the
+output PDF (for and example, see codehero's
+[identify_pdf_fields.js][]).
+
+Conclusions
+===========
+
+This would be so much easier if people just used [YAML][] or [JSON][]
+instead of bothering with PDFs ;).
+
+
+[JR]: http://www.myown1.com/linux/pdf_formfill.shtml
+[FDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#Forms_Data_Format_.28FDF.29
+[XFDF]: http://en.wikipedia.org/wiki/Forms_Data_Format#XML_Forms_Data_Format_.28XFDF.29
+[XFDF-spec]: http://partners.adobe.com/public/developer/en/xml/xfdf_2.0.pdf
+[ISO32000]: http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
+[UTF-8]: http://en.wikipedia.org/wiki/UTF-8
+[BOM]: http://en.wikipedia.org/wiki/Byte_order_mark
+[utf-8-patch]: https://sourceforge.net/p/itext/patches/101/
+[pdffonts]: http://poppler.freedesktop.org/
+[identify_pdf_fields.js]: https://github.com/codehero/OpenTaxFormFiller/blob/master/script/identify_pdf_fields.js
+[YAML]: http://www.yaml.org/
+[JSON]: http://www.json.org/
+
+[[!tag tags/tools]]
--- /dev/null
+From fe83cb12c2c275ccd922adc63d54b5b6c0604a2d Mon Sep 17 00:00:00 2001
+Message-Id: <fe83cb12c2c275ccd922adc63d54b5b6c0604a2d.1348172464.git.wking@tremily.us>
+From: "W. Trevor King" <wking@tremily.us>
+Date: Thu, 20 Sep 2012 16:01:19 -0400
+Subject: [PATCH] Add support for /Encoding/utf_8 to the FDF reader.
+
+From PDF 32000-1:2008, section 12.7.7.3.1, table 243 (Entries in the
+FDF dictionary), on page 459:
+
+ Key: Encoding
+ Type: name
+ Value:
+
+ (Optional; PDF 1.3) The encoding that shall be used for any FDF
+ field value or option (V or Opt in the field dictionary; see Table
+ 246) or field name that is a string and does not begin with the
+ Unicode prefix U+FEFF.
+
+ Default value: PDFDocEncoding.
+
+ Other allowed values include Shift_JIS, BigFive, GBK, UHC, utf_8,
+ utf_16
+---
+ java/com/lowagie/text/pdf/FdfReader.java | 2 ++
+ java/com/lowagie/text/pdf/PdfName.java | 2 ++
+ 2 files changed, 4 insertions(+)
+
+diff --git a/java/com/lowagie/text/pdf/FdfReader.java b/java/com/lowagie/text/pdf/FdfReader.java
+index f8776ab..94b432e 100644
+--- a/java/com/lowagie/text/pdf/FdfReader.java
++++ b/java/com/lowagie/text/pdf/FdfReader.java
+@@ -188,6 +188,8 @@ public class FdfReader extends PdfReader {
+ return new String(b, "GBK");
+ else if (encoding.equals(PdfName.BIGFIVE))
+ return new String(b, "Big5");
++ else if (encoding.equals(PdfName.UTF_8))
++ return new String(b, "UTF8");
+ }
+ catch (Exception e) {
+ }
+diff --git a/java/com/lowagie/text/pdf/PdfName.java b/java/com/lowagie/text/pdf/PdfName.java
+index bd4aaeb..50d3704 100644
+--- a/java/com/lowagie/text/pdf/PdfName.java
++++ b/java/com/lowagie/text/pdf/PdfName.java
+@@ -903,6 +903,8 @@ public class PdfName extends PdfObject implements Comparable{
+ /** A name */
+ public static final PdfName USETHUMBS = new PdfName("UseThumbs");
+ /** A name */
++ public static final PdfName UTF_8 = new PdfName("utf_8");
++ /** A name */
+ public static final PdfName V = new PdfName("V");
+ /** A name */
+ public static final PdfName VERISIGN_PPKVS = new PdfName("VeriSign.PPKVS");
+--
+1.7.12.176.g3fc0e4c.dirty
+