Remove `[:-len('�')]` from Unicode parsing of pdftk output in pdf-merge.py.
authorW. Trevor King <wking@drexel.edu>
Thu, 9 Feb 2012 11:54:34 +0000 (06:54 -0500)
committerW. Trevor King <wking@drexel.edu>
Thu, 9 Feb 2012 11:59:08 +0000 (06:59 -0500)
Thanks to Larry Cai <larry.caiyu@gmail.com> for pointing this out:

On Thu, Feb 09, 2012 at 03:25:09PM +0800, Larry Cai wrote:
> …
> When I just remove "[:-len('&#0;')]", it seem works!!
> …

I had thought that pdftk always appended a trailing null byte to
Unicode strings, but that appears to be incorrect.

posts/PDF_bookmarks_with_Ghostscript/pdf-merge.py

index 77c32c39ad135bc960ab6f7dedd74b48ce365f3d..cb6bec05aca2c1f3f056c77ac8b4ac82f165b739 100755 (executable)
@@ -102,7 +102,7 @@ class BookmarkedPDF (object):
         ...     'BookmarkTitle: Section 1.1.2',
         ...     'BookmarkLevel: 3',
         ...     'BookmarkPageNumber: 4',
-        ...     'BookmarkTitle: &#945;&#946;&#947;&#0;',
+        ...     'BookmarkTitle: &#945;&#946;&#947;',
         ...     'BookmarkLevel: 4',
         ...     'BookmarkPageNumber: 4',
         ...     'BookmarkTitle: Section 1.2',
@@ -146,7 +146,7 @@ class BookmarkedPDF (object):
                     value = int(value)
                 elif k == 'title':
                     if self._UNICODE_REGEXP.search(value):
-                        value = self._unicode_replace(value[:-len('&#0;')])
+                        value = self._unicode_replace(value)
                     else:
                         value = unicode(value)
                 bookmark_info[k] = value