ssoma-mda.git
10 years agossoma-mda: Add compatibility decoding for the '\x10\x10' encoding python
W. Trevor King [Fri, 7 Nov 2014 08:58:34 +0000 (00:58 -0800)]
ssoma-mda: Add compatibility decoding for the '\x10\x10' encoding

I'm not sure if this was a user error or a Git send-email error, but
notmuch has a message with:

  Subject: =?\x10\x10?q?=5BPATCH=20v7=203/3=5D=20Use=20the=20structured=20formatters=20in=20notmuch-search=2Ec=2E?=
  ...
  Message-Id: <1342766173-1344-4-git-send-email-craven@gmx.net>
  X-Mailer: git-send-email 1.7.11.2

Without this patch, that raises:

  LookupError: unknown encoding:

10 years agossoma-mda: Add compatibility decoding for the 'no' encoding
W. Trevor King [Fri, 7 Nov 2014 08:32:25 +0000 (00:32 -0800)]
ssoma-mda: Add compatibility decoding for the 'no' encoding

I'm not sure if this was a user error or a Git send-email error, but
notmuch has a message with:

  Subject: =?no?q?=5BPATCH=203/3=5D=20Add=20=27compose=27=20command?=
  ...
  Message-Id: <1291933972-7186-4-git-send-email-felipe.contreras@gmail.com>
  X-Mailer: git-send-email 1.7.3.2

Without this patch, that raises:

  LookupError: unknown encoding: no

10 years agossoma-mda: Fix RFC-2047 decoding for partially-encoded strings
W. Trevor King [Fri, 7 Nov 2014 08:12:58 +0000 (00:12 -0800)]
ssoma-mda: Fix RFC-2047 decoding for partially-encoded strings

Avoid:

  TypeError: sequence item 0: expected str instance, bytes found

When an RFC-2047-encoded string contains both an unencoded and an
encoded section.  For example:

  >>> import email.header
  >>> email.header.decode_header('Keld =?ISO-8859-1?Q?J=F8rn_Simonsen?=')
  [(b'Keld ', None), (b'J\xf8rn Simonsen', 'iso-8859-1')]

returns the decoded string in bytes but no charset information for the
first chunk.  I'm not sure what the default charset for header values
is, but RFC 2047 sets itself up to deal with non-ASCII header values
[1], so I'm guessing it's ASCII ;).

[1]: http://tools.ietf.org/html/rfc2047#section-1

10 years agossoma-mda: Decode RFC-2047-encoded From: headers
W. Trevor King [Fri, 7 Nov 2014 07:56:47 +0000 (23:56 -0800)]
ssoma-mda: Decode RFC-2047-encoded From: headers

The examples in RFC 2047 have an encoded name, but an ASCII email
address [1].  Make sure we handle that appropriately.

[1]: http://tools.ietf.org/html/rfc2047#section-8

10 years agossoma-mda: Handle Subject:s that aren't RFC-2047-encoded too
W. Trevor King [Fri, 7 Nov 2014 07:32:03 +0000 (23:32 -0800)]
ssoma-mda: Handle Subject:s that aren't RFC-2047-encoded too

If the string is not RFC-2047 encoded, the charset from decode_header
is None:

  >>> import email.header
  >>> email.header.decode_header('hello')
  [('hello', None)]

so str(decoded, charset) will fail with:

  TypeError: str() argument 2 must be str, not None

Avoid that by checking charset before attempting to decode with
charset.  Since that's a bit awkward, pull it out into its own
_decode_header function.

The Simonsen example is from RFC 2047 [1].

[1]: http://tools.ietf.org/html/rfc2047#section-8

10 years agossoma-mda: Decode RFC-2047-encoded Subject: headers
W. Trevor King [Fri, 7 Nov 2014 06:59:47 +0000 (22:59 -0800)]
ssoma-mda: Decode RFC-2047-encoded Subject: headers

For example, get [1]:

  'If you can read this you understand the example.'

from a message with:

  Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
    =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

[1]: http://tools.ietf.org/html/rfc2047.html#section-8

10 years agossoma-mda: Unfold folding whitespace in message subjects
W. Trevor King [Fri, 7 Nov 2014 06:18:13 +0000 (22:18 -0800)]
ssoma-mda: Unfold folding whitespace in message subjects

For the RFC specs on folding whitespace, see [1] and [2].  For the
decision to preserve the original folding in Python headers, here's
R. David Murray's comment from 2011-04-18 [3]:

  I have, by the way, come around to the view that we should never be
  introducing or deleting whitespace except when RFC 2047
  encoding/decoding...we are still deleting it in a couple places, and
  I will address that by and by

[1]: https://tools.ietf.org/html/rfc5322#section-2.2.3
[2]: https://tools.ietf.org/html/rfc5322#section-3.2.2
[3]: http://bugs.python.org/issue1372770#msg133974

10 years agossoma-mda: Support UTC dates
W. Trevor King [Fri, 7 Nov 2014 06:10:45 +0000 (22:10 -0800)]
ssoma-mda: Support UTC dates

For example:

  >>> datetime = email.utils.parsedate_to_datetime('Tue, 21 Dec 2010 03:52:23 -0000')
  >>> datetime.utcoffset() is None
  True

so without this case, we'll get:

  AttributeError: 'NoneType' object has no attribute 'seconds'

10 years agossoma-mda: Support name-less From: headers
W. Trevor King [Fri, 7 Nov 2014 06:08:53 +0000 (22:08 -0800)]
ssoma-mda: Support name-less From: headers

For example:

  >>> email.utils.parseaddr('alice@example.net')
  ('', 'alice@example.net')

Now folks with:

  From: alice@example.net

will be stored in the Git commit as:

  "alice@example.net" <alice@example.net>

10 years agoWIP: ssoma-mda: Translate to Python3.3+, using pygit2 (~0.21.3+)
W. Trevor King [Tue, 28 Oct 2014 04:43:14 +0000 (21:43 -0700)]
WIP: ssoma-mda: Translate to Python3.3+, using pygit2 (~0.21.3+)

I'll test with older versions of pygit2 to figure out where the cutoff
is.  And without pygit2 in Debian (even Debian testing), it gets a bit
harder to get this going.  You should be able to use:

  $ apt-get install libgit2-dev
  $ pip install cffi
  $ pip install pygit2

but I haven't fired up a Debian image to test that yet.

10 years agoRemove everything not needed for ssoma-mda
W. Trevor King [Sat, 18 Oct 2014 22:05:55 +0000 (15:05 -0700)]
Remove everything not needed for ssoma-mda

In preparation for the Python translation.

10 years agossoma-mda: Use the email subject as the commit message master
W. Trevor King [Sat, 18 Oct 2014 20:02:12 +0000 (13:02 -0700)]
ssoma-mda: Use the email subject as the commit message

This is more interesting than just using 'mda' all the time, but it's
harder to setup proper quoting around the message without using
third-party Perl modules (e.g. IPC::Run or String::ShellQuote).  This
proof-of-concept patch just assumes the subject doesn't contain
single-quotes (').  This patch also doesn't handle the empty/missing
subject case, which should probably fall back to '<no subject>' or
some such.

I'm fine dropping support for older Gits here, and just using the -m
option to commit-tree.  That landed with 96b8d93a (commit-tree: teach
-m/-F options to read logs from elsewhere, 2011-11-09) in Git v1.7.9,
which was released over 2.5 years ago on 2012-01-27.

It would also be useful (I think) to set the GIT_AUTHOR_NAME,
GIT_AUTHOR_EMAIL, and GIT_AUTHOR_DATE environment variables from the
message header before committing.  I know how to do that using
Python's subprocess module, but I don't know the Perl incantation.

10 years agodoc: remove HTML
Eric Wong [Wed, 2 Jul 2014 19:30:24 +0000 (19:30 +0000)]
doc: remove HTML

Even with txt2pre, the maintenance/discoverability burden is too
high and lynx still uses too much memory.  Unfortunately, we'll have
to keep our INSTALL.html for a while longer on the server since it's
linked, but not index.html!

10 years agossoma: cleanup IMAP password warnings
Eric Wong [Mon, 5 May 2014 20:21:13 +0000 (20:21 +0000)]
ssoma: cleanup IMAP password warnings

password may be an empty string, so we must check length.

10 years agossoma 0.1.0 v0.1.0
Eric Wong [Mon, 5 May 2014 05:31:37 +0000 (05:31 +0000)]
ssoma 0.1.0

* doc: describe public-inbox dedupe
* ssoma: lock against concurrent fetch/remote add
* ssoma: avoid redundant slash for expand_path
* extractor: filter out non-message paths

10 years agoextractor: filter out non-message paths
Eric Wong [Thu, 1 May 2014 20:38:39 +0000 (20:38 +0000)]
extractor: filter out non-message paths

We may allow files like "README" to appear in ssoma
repositories to reduce confusion.

10 years agossoma: avoid redundant slash for expand_path
Eric Wong [Wed, 30 Apr 2014 02:10:45 +0000 (02:10 +0000)]
ssoma: avoid redundant slash for expand_path

This makes our error messages look ugly.

10 years agossoma: lock against concurrent fetch/remote add
Eric Wong [Sat, 26 Apr 2014 03:16:06 +0000 (03:16 +0000)]
ssoma: lock against concurrent fetch/remote add

A user may manually run ssoma while cron is running,
so avoid any potential synchronization problems in this
case.

10 years agodoc: describe public-inbox dedupe
Eric Wong [Sat, 26 Apr 2014 00:29:57 +0000 (00:29 +0000)]
doc: describe public-inbox dedupe

Duplicate Message-IDs are uncommon enough to drop.

10 years agoINSTALL: add tarball link v0.0.0
Eric Wong [Mon, 21 Apr 2014 17:26:13 +0000 (17:26 +0000)]
INSTALL: add tarball link

Some users expect and prefer tarballs.

10 years agossoma: --cron implies --quiet
Eric Wong [Mon, 21 Apr 2014 09:42:14 +0000 (09:42 +0000)]
ssoma: --cron implies --quiet

cronjobs should be quiet, since cron default to emailing the user
on output.

10 years agodoc: various fixes and URL changes
Eric Wong [Mon, 21 Apr 2014 08:56:22 +0000 (08:56 +0000)]
doc: various fixes and URL changes

We don't need a specific list for ssoma, yet, just use the
meta@public-inbox.org list to avoid fragmentation.

10 years agossoma: add --since option for time-limiting imports
Eric Wong [Mon, 21 Apr 2014 08:42:00 +0000 (08:42 +0000)]
ssoma: add --since option for time-limiting imports

This should make it easier to avoid duplicating mail if
you're coming from being a normal mailing list subscriber
and switching to ssoma.

10 years agoworkaround older git without "commit-tree -m"
Eric Wong [Sun, 20 Apr 2014 23:49:17 +0000 (23:49 +0000)]
workaround older git without "commit-tree -m"

We need to support older git versions lying around.
Some versions broke argument ordering, too.

10 years agomda: keep Status: header when doing injection
Eric Wong [Sun, 20 Apr 2014 19:47:34 +0000 (19:47 +0000)]
mda: keep Status: header when doing injection

Non-public-inbox users may want to archive their personal email
with ssoma, so preserve the Status: line if it exists.  public-inbox
already kills the Status: header.

10 years agosome minor documentation tweaks
Eric Wong [Sun, 20 Apr 2014 19:46:11 +0000 (19:46 +0000)]
some minor documentation tweaks

Hopefully clarify things for folks coming from public-inbox.

10 years agodocumentation improvements, HTML page
Eric Wong [Sun, 20 Apr 2014 19:25:51 +0000 (19:25 +0000)]
documentation improvements, HTML page

10 years agouse Git.pm for efficient cat_blob if available
Eric Wong [Sun, 20 Apr 2014 19:18:19 +0000 (19:18 +0000)]
use Git.pm for efficient cat_blob if available

This reduces the amount of fork+exec and should improve performance
for large imports.

10 years agoGit*.pm: allow code improvements to flow back to git
Eric Wong [Sun, 20 Apr 2014 19:07:14 +0000 (19:07 +0000)]
Git*.pm: allow code improvements to flow back to git

By using GPLv2+, we are compatible with AGPLv3+ while still
allowing improvements to flow back into the git-svn modules
distributed with git.

10 years agossoma: add --cron option to sync
Eric Wong [Wed, 16 Apr 2014 19:45:51 +0000 (19:45 +0000)]
ssoma: add --cron option to sync

Encourages users to add "ssoma sync --cron" to their crontabs
and reduce load spikes.

10 years agossoma: use implicit $_ for simpler arg generation
Eric Wong [Wed, 16 Apr 2014 19:43:25 +0000 (19:43 +0000)]
ssoma: use implicit $_ for simpler arg generation

This makes the loop shorter and judicious use of $_ is OK.

10 years agoextractor: clarify naming for message delivery
Eric Wong [Tue, 15 Apr 2014 01:03:55 +0000 (01:03 +0000)]
extractor: clarify naming for message delivery

We'll be supporting multiple refs

10 years agoREADME: share list with public-inbox
Eric Wong [Sat, 12 Apr 2014 12:11:17 +0000 (12:11 +0000)]
README: share list with public-inbox

No reason to separate communities this early on.

10 years agouse flock instead of fcntl locking
Eric Wong [Sat, 12 Apr 2014 04:12:47 +0000 (04:12 +0000)]
use flock instead of fcntl locking

We do not need range locking of fcntl locks, so using flock removes
a dependency, hopefully making us easier-to-install.  Also keep in
mind Ruby (and perhaps other scripting language) supports flock
out-of-the-box as well, so it seems flock is easier to support
although fcntl locks offer superior functionality.

10 years agocleanup globbing
Eric Wong [Thu, 10 Apr 2014 07:26:08 +0000 (07:26 +0000)]
cleanup globbing

Calling the glob function explicitly seems to be favored nowadays.

10 years agoINSTALL: fix misnamed Debian package
Eric Wong [Thu, 10 Apr 2014 06:09:37 +0000 (06:09 +0000)]
INSTALL: fix misnamed Debian package

While we're at it, sort Makefile.PL so it's harder to
miss things.

10 years agoMakefile.PL: add parallel tests
Eric Wong [Wed, 9 Apr 2014 18:12:49 +0000 (18:12 +0000)]
Makefile.PL: add parallel tests

These tests are intended to run in parallel.

10 years agot/all: fixup test for missing IPC::Run
Eric Wong [Wed, 9 Apr 2014 18:12:06 +0000 (18:12 +0000)]
t/all: fixup test for missing IPC::Run

I forgot to re-enable the test once I ensured things passed without
IPC::Run.

10 years agomid2path ignores leading '<' and trailing '>'
Eric Wong [Tue, 8 Apr 2014 23:48:31 +0000 (23:48 +0000)]
mid2path ignores leading '<' and trailing '>'

This simplifies our code a bit, and hopefully in public-inbox, too.
There is little practical danger of a Message-ID not having '<>',
and having '<>' in all URLs is annoying.
This breaks compatibility.  Fortunately, this project is not
publically announced, yet.

10 years agolib/Ssoma/Git*: clarify copyright on original git code
Eric Wong [Tue, 8 Apr 2014 08:37:21 +0000 (08:37 +0000)]
lib/Ssoma/Git*: clarify copyright on original git code

I cannot change the license of git proper, of course.

10 years agoINSTALL: update documentation
Eric Wong [Tue, 8 Apr 2014 07:43:50 +0000 (07:43 +0000)]
INSTALL: update documentation

public-inbox (server daemon) is a separate project now
and ssoma is fairly generic.

10 years agot/all: IPC::Run is optional in tests
Eric Wong [Tue, 8 Apr 2014 07:37:50 +0000 (07:37 +0000)]
t/all: IPC::Run is optional in tests

We do not force users to install libraries only needed for testing.

10 years agouse "Message-ID" capitalization consistently
Eric Wong [Tue, 8 Apr 2014 01:44:32 +0000 (01:44 +0000)]
use "Message-ID" capitalization consistently

Technically it's case-insensitive; but "ID" is short for
"identifier" or "identification", and not a fish or a part
of a person's psyche.

10 years agossoma-mda: duplicate prevention
Eric Wong [Tue, 8 Apr 2014 01:36:53 +0000 (01:36 +0000)]
ssoma-mda: duplicate prevention

This is mainly for public-inbox, as duplicate message IDs are
usually evidence something is suspicious or a misconfigured SMTP
server/client.

10 years agoinitial commit
Eric Wong [Thu, 27 Mar 2014 20:38:26 +0000 (20:38 +0000)]
initial commit