Handle Unicode strings in pdf-merge.py.
For information on Unicode strings in PDFs, see `§7.3.4 String
Objects` and `§7.9.2.2 Text String Type` in the PDF reference [1] and
`Table 2.3 (p21)`, `Table 2.5 (p25)`, etc. in the pdfmark reference
[2].
Note that there are Ghostscript bugs [3] that can lead to errors like:
Entity: line 5: parser error : xmlParseCharRef: invalid xmlChar value 1
<rdf:Description rdf:about='
e7674657-8a09-11ec-0000-
cfd67fe5d10' xmlns:pdf='
and:
Entity: line 9: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xAC 0x26 0x23 0x32
dobe.com/xap/1.0/mm/' xapMM:DocumentID='uuid:
3792d85d-8a0b-11ec-0000-
cfd67fe5d10
when you open your generated PDF in `evince`. These bugs were fixed
in Ghostscript 9.06.
The encoding of command-line arguments are not well standardized [4],
so I supply an `--argv-encoding` option to override the locale if
necessary.
[1] Document management — Portable document format — Part 1: PDF 1.7
(PDF 32000-1:2008, July 2008
http://www.adobe.com/devnet/pdf/pdf_reference.html)
[2] pdfmark Reference
(Edition 1.0, November 2006
http://www.adobe.com/devnet/acrobat/pdfs/pdfmark_reference.pdf)
[3] http://bugs.ghostscript.com/show_bug.cgi?id=692422
[4] http://stackoverflow.com/questions/
5408730/what-is-the-encoding-of-argv