summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
W. Trevor King [Fri, 7 Nov 2014 08:58:34 +0000 (00:58 -0800)]
ssoma-mda: Add compatibility decoding for the '\x10\x10' encoding
I'm not sure if this was a user error or a Git send-email error, but
notmuch has a message with:
Subject: =?\x10\x10?q?=5BPATCH=20v7=203/3=5D=20Use=20the=20structured=20formatters=20in=20notmuch-search=2Ec=2E?=
...
Message-Id: <
1342766173-1344-4-git-send-email-craven@gmx.net>
X-Mailer: git-send-email 1.7.11.2
Without this patch, that raises:
LookupError: unknown encoding:
W. Trevor King [Fri, 7 Nov 2014 08:32:25 +0000 (00:32 -0800)]
ssoma-mda: Add compatibility decoding for the 'no' encoding
I'm not sure if this was a user error or a Git send-email error, but
notmuch has a message with:
Subject: =?no?q?=5BPATCH=203/3=5D=20Add=20=27compose=27=20command?=
...
Message-Id: <
1291933972-7186-4-git-send-email-felipe.contreras@gmail.com>
X-Mailer: git-send-email 1.7.3.2
Without this patch, that raises:
LookupError: unknown encoding: no
W. Trevor King [Fri, 7 Nov 2014 08:12:58 +0000 (00:12 -0800)]
ssoma-mda: Fix RFC-2047 decoding for partially-encoded strings
Avoid:
TypeError: sequence item 0: expected str instance, bytes found
When an RFC-2047-encoded string contains both an unencoded and an
encoded section. For example:
>>> import email.header
>>> email.header.decode_header('Keld =?ISO-8859-1?Q?J=F8rn_Simonsen?=')
[(b'Keld ', None), (b'J\xf8rn Simonsen', 'iso-8859-1')]
returns the decoded string in bytes but no charset information for the
first chunk. I'm not sure what the default charset for header values
is, but RFC 2047 sets itself up to deal with non-ASCII header values
[1], so I'm guessing it's ASCII ;).
[1]: http://tools.ietf.org/html/rfc2047#section-1
W. Trevor King [Fri, 7 Nov 2014 07:56:47 +0000 (23:56 -0800)]
ssoma-mda: Decode RFC-2047-encoded From: headers
The examples in RFC 2047 have an encoded name, but an ASCII email
address [1]. Make sure we handle that appropriately.
[1]: http://tools.ietf.org/html/rfc2047#section-8
W. Trevor King [Fri, 7 Nov 2014 07:32:03 +0000 (23:32 -0800)]
ssoma-mda: Handle Subject:s that aren't RFC-2047-encoded too
If the string is not RFC-2047 encoded, the charset from decode_header
is None:
>>> import email.header
>>> email.header.decode_header('hello')
[('hello', None)]
so str(decoded, charset) will fail with:
TypeError: str() argument 2 must be str, not None
Avoid that by checking charset before attempting to decode with
charset. Since that's a bit awkward, pull it out into its own
_decode_header function.
The Simonsen example is from RFC 2047 [1].
[1]: http://tools.ietf.org/html/rfc2047#section-8
W. Trevor King [Fri, 7 Nov 2014 06:59:47 +0000 (22:59 -0800)]
ssoma-mda: Decode RFC-2047-encoded Subject: headers
For example, get [1]:
'If you can read this you understand the example.'
from a message with:
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
[1]: http://tools.ietf.org/html/rfc2047.html#section-8
W. Trevor King [Fri, 7 Nov 2014 06:18:13 +0000 (22:18 -0800)]
ssoma-mda: Unfold folding whitespace in message subjects
For the RFC specs on folding whitespace, see [1] and [2]. For the
decision to preserve the original folding in Python headers, here's
R. David Murray's comment from 2011-04-18 [3]:
I have, by the way, come around to the view that we should never be
introducing or deleting whitespace except when RFC 2047
encoding/decoding...we are still deleting it in a couple places, and
I will address that by and by
[1]: https://tools.ietf.org/html/rfc5322#section-2.2.3
[2]: https://tools.ietf.org/html/rfc5322#section-3.2.2
[3]: http://bugs.python.org/issue1372770#msg133974
W. Trevor King [Fri, 7 Nov 2014 06:10:45 +0000 (22:10 -0800)]
ssoma-mda: Support UTC dates
For example:
>>> datetime = email.utils.parsedate_to_datetime('Tue, 21 Dec 2010 03:52:23 -0000')
>>> datetime.utcoffset() is None
True
so without this case, we'll get:
AttributeError: 'NoneType' object has no attribute 'seconds'
W. Trevor King [Fri, 7 Nov 2014 06:08:53 +0000 (22:08 -0800)]
ssoma-mda: Support name-less From: headers
For example:
>>> email.utils.parseaddr('alice@example.net')
('', 'alice@example.net')
Now folks with:
From: alice@example.net
will be stored in the Git commit as:
"alice@example.net" <alice@example.net>
W. Trevor King [Tue, 28 Oct 2014 04:43:14 +0000 (21:43 -0700)]
WIP: ssoma-mda: Translate to Python3.3+, using pygit2 (~0.21.3+)
I'll test with older versions of pygit2 to figure out where the cutoff
is. And without pygit2 in Debian (even Debian testing), it gets a bit
harder to get this going. You should be able to use:
$ apt-get install libgit2-dev
$ pip install cffi
$ pip install pygit2
but I haven't fired up a Debian image to test that yet.
W. Trevor King [Sat, 18 Oct 2014 22:05:55 +0000 (15:05 -0700)]
Remove everything not needed for ssoma-mda
In preparation for the Python translation.
W. Trevor King [Sat, 18 Oct 2014 20:02:12 +0000 (13:02 -0700)]
ssoma-mda: Use the email subject as the commit message
This is more interesting than just using 'mda' all the time, but it's
harder to setup proper quoting around the message without using
third-party Perl modules (e.g. IPC::Run or String::ShellQuote). This
proof-of-concept patch just assumes the subject doesn't contain
single-quotes ('). This patch also doesn't handle the empty/missing
subject case, which should probably fall back to '<no subject>' or
some such.
I'm fine dropping support for older Gits here, and just using the -m
option to commit-tree. That landed with
96b8d93a (commit-tree: teach
-m/-F options to read logs from elsewhere, 2011-11-09) in Git v1.7.9,
which was released over 2.5 years ago on 2012-01-27.
It would also be useful (I think) to set the GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from the
message header before committing. I know how to do that using
Python's subprocess module, but I don't know the Perl incantation.
Eric Wong [Wed, 2 Jul 2014 19:30:24 +0000 (19:30 +0000)]
doc: remove HTML
Even with txt2pre, the maintenance/discoverability burden is too
high and lynx still uses too much memory. Unfortunately, we'll have
to keep our INSTALL.html for a while longer on the server since it's
linked, but not index.html!
Eric Wong [Mon, 5 May 2014 20:21:13 +0000 (20:21 +0000)]
ssoma: cleanup IMAP password warnings
password may be an empty string, so we must check length.
Eric Wong [Mon, 5 May 2014 05:31:37 +0000 (05:31 +0000)]
ssoma 0.1.0
* doc: describe public-inbox dedupe
* ssoma: lock against concurrent fetch/remote add
* ssoma: avoid redundant slash for expand_path
* extractor: filter out non-message paths
Eric Wong [Thu, 1 May 2014 20:38:39 +0000 (20:38 +0000)]
extractor: filter out non-message paths
We may allow files like "README" to appear in ssoma
repositories to reduce confusion.
Eric Wong [Wed, 30 Apr 2014 02:10:45 +0000 (02:10 +0000)]
ssoma: avoid redundant slash for expand_path
This makes our error messages look ugly.
Eric Wong [Sat, 26 Apr 2014 03:16:06 +0000 (03:16 +0000)]
ssoma: lock against concurrent fetch/remote add
A user may manually run ssoma while cron is running,
so avoid any potential synchronization problems in this
case.
Eric Wong [Sat, 26 Apr 2014 00:29:57 +0000 (00:29 +0000)]
doc: describe public-inbox dedupe
Duplicate Message-IDs are uncommon enough to drop.
Eric Wong [Mon, 21 Apr 2014 17:26:13 +0000 (17:26 +0000)]
INSTALL: add tarball link
Some users expect and prefer tarballs.
Eric Wong [Mon, 21 Apr 2014 09:42:14 +0000 (09:42 +0000)]
ssoma: --cron implies --quiet
cronjobs should be quiet, since cron default to emailing the user
on output.
Eric Wong [Mon, 21 Apr 2014 08:56:22 +0000 (08:56 +0000)]
doc: various fixes and URL changes
We don't need a specific list for ssoma, yet, just use the
meta@public-inbox.org list to avoid fragmentation.
Eric Wong [Mon, 21 Apr 2014 08:42:00 +0000 (08:42 +0000)]
ssoma: add --since option for time-limiting imports
This should make it easier to avoid duplicating mail if
you're coming from being a normal mailing list subscriber
and switching to ssoma.
Eric Wong [Sun, 20 Apr 2014 23:49:17 +0000 (23:49 +0000)]
workaround older git without "commit-tree -m"
We need to support older git versions lying around.
Some versions broke argument ordering, too.
Eric Wong [Sun, 20 Apr 2014 19:47:34 +0000 (19:47 +0000)]
mda: keep Status: header when doing injection
Non-public-inbox users may want to archive their personal email
with ssoma, so preserve the Status: line if it exists. public-inbox
already kills the Status: header.
Eric Wong [Sun, 20 Apr 2014 19:46:11 +0000 (19:46 +0000)]
some minor documentation tweaks
Hopefully clarify things for folks coming from public-inbox.
Eric Wong [Sun, 20 Apr 2014 19:25:51 +0000 (19:25 +0000)]
documentation improvements, HTML page
Eric Wong [Sun, 20 Apr 2014 19:18:19 +0000 (19:18 +0000)]
use Git.pm for efficient cat_blob if available
This reduces the amount of fork+exec and should improve performance
for large imports.
Eric Wong [Sun, 20 Apr 2014 19:07:14 +0000 (19:07 +0000)]
Git*.pm: allow code improvements to flow back to git
By using GPLv2+, we are compatible with AGPLv3+ while still
allowing improvements to flow back into the git-svn modules
distributed with git.
Eric Wong [Wed, 16 Apr 2014 19:45:51 +0000 (19:45 +0000)]
ssoma: add --cron option to sync
Encourages users to add "ssoma sync --cron" to their crontabs
and reduce load spikes.
Eric Wong [Wed, 16 Apr 2014 19:43:25 +0000 (19:43 +0000)]
ssoma: use implicit $_ for simpler arg generation
This makes the loop shorter and judicious use of $_ is OK.
Eric Wong [Tue, 15 Apr 2014 01:03:55 +0000 (01:03 +0000)]
extractor: clarify naming for message delivery
We'll be supporting multiple refs
Eric Wong [Sat, 12 Apr 2014 12:11:17 +0000 (12:11 +0000)]
README: share list with public-inbox
No reason to separate communities this early on.
Eric Wong [Sat, 12 Apr 2014 04:12:47 +0000 (04:12 +0000)]
use flock instead of fcntl locking
We do not need range locking of fcntl locks, so using flock removes
a dependency, hopefully making us easier-to-install. Also keep in
mind Ruby (and perhaps other scripting language) supports flock
out-of-the-box as well, so it seems flock is easier to support
although fcntl locks offer superior functionality.
Eric Wong [Thu, 10 Apr 2014 07:26:08 +0000 (07:26 +0000)]
cleanup globbing
Calling the glob function explicitly seems to be favored nowadays.
Eric Wong [Thu, 10 Apr 2014 06:09:37 +0000 (06:09 +0000)]
INSTALL: fix misnamed Debian package
While we're at it, sort Makefile.PL so it's harder to
miss things.
Eric Wong [Wed, 9 Apr 2014 18:12:49 +0000 (18:12 +0000)]
Makefile.PL: add parallel tests
These tests are intended to run in parallel.
Eric Wong [Wed, 9 Apr 2014 18:12:06 +0000 (18:12 +0000)]
t/all: fixup test for missing IPC::Run
I forgot to re-enable the test once I ensured things passed without
IPC::Run.
Eric Wong [Tue, 8 Apr 2014 23:48:31 +0000 (23:48 +0000)]
mid2path ignores leading '<' and trailing '>'
This simplifies our code a bit, and hopefully in public-inbox, too.
There is little practical danger of a Message-ID not having '<>',
and having '<>' in all URLs is annoying.
This breaks compatibility. Fortunately, this project is not
publically announced, yet.
Eric Wong [Tue, 8 Apr 2014 08:37:21 +0000 (08:37 +0000)]
lib/Ssoma/Git*: clarify copyright on original git code
I cannot change the license of git proper, of course.
Eric Wong [Tue, 8 Apr 2014 07:43:50 +0000 (07:43 +0000)]
INSTALL: update documentation
public-inbox (server daemon) is a separate project now
and ssoma is fairly generic.
Eric Wong [Tue, 8 Apr 2014 07:37:50 +0000 (07:37 +0000)]
t/all: IPC::Run is optional in tests
We do not force users to install libraries only needed for testing.
Eric Wong [Tue, 8 Apr 2014 01:44:32 +0000 (01:44 +0000)]
use "Message-ID" capitalization consistently
Technically it's case-insensitive; but "ID" is short for
"identifier" or "identification", and not a fish or a part
of a person's psyche.
Eric Wong [Tue, 8 Apr 2014 01:36:53 +0000 (01:36 +0000)]
ssoma-mda: duplicate prevention
This is mainly for public-inbox, as duplicate message IDs are
usually evidence something is suspicious or a misconfigured SMTP
server/client.
Eric Wong [Thu, 27 Mar 2014 20:38:26 +0000 (20:38 +0000)]
initial commit