git.tremily.us Git - notmuch.git/log

projects / notmuch.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 13:08:22 +0000 (06:08 -0700)]

notmuch restore: Print names of tags that cannot be applied

This helps the user gauge the severity of the error.

For example, when restoring my sup tags I see a bunch of tags missing
for message IDs of the form "sup-faked-...". That's not surprising
since I know that sup generates these with the md5sum of the message
header while notmuch uses the sha-1 of the entire message. But how
much will this hurt?

Well, now that I can see that most of the missing tags are just
"attachment", then I'm not concerned, (I'll be automatically creating
that tag in the future based on the message contents). But if a
missing tag is "inbox" then that's more concerning because that's data
that I can't easily regenerate outside of sup.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 13:06:20 +0000 (06:06 -0700)]

notmuch_tags_has_more: Fix to use string.empty rather than string.size

I'm really interested in the length of the data here, not the size
of the storage.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 13:04:57 +0000 (06:04 -0700)]

Fix notmuch_message_get_message_id to never return NULL.

With the recent improvements to the handling of message IDs we
"know" that a NULL message ID is impossible, (so we simply
abort if the impossible happens).

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 13:00:10 +0000 (06:00 -0700)]

add_message: Fix to not add multiple documents with the same message ID

Here's the second big fix to message-ID handling, (the first was to
generate message IDs when an email contained none). Now, with no
document missing a message ID, and no two documents having the same
message ID, we have a nice consistent database where the message ID
can be used as a unique key.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:53:52 +0000 (05:53 -0700)]

Add _notmuch_message_create_for_message_id

This is the last piece needed for add_message to be able to properly
support a message with a duplicate message ID. This function creates
a new notmuch_message_t object but one that may reference an existing
document in the database.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:45:29 +0000 (05:45 -0700)]

Fix _notmuch_message_create to catch Xapian DocNotFoundError.

This function is only supposed to be called with a doc_id that
was queried from the database already. So there's an internal
error if no document with that doc_id can be found in the database.

In that case, return NULL.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:41:17 +0000 (05:41 -0700)]

Add internal functions for manipulating a new notmuch_message_t

This will support the add_message function in incrementally creating
state in a new notmuch_message_t. The new functions are

      _notmuch_message_set_filename
      _notmuch_message_add_thread_id
      _notmuch_message_ensure_thread_id
      _notmuch_message_set_date
      _notmuch_message_sync

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:38:13 +0000 (05:38 -0700)]

Add notmuch_message_get_filename

This is a new public function to find the filename of the original
email message for a message-object that was found in the database.

We may change this function in the future to support returning a
list of filenames, (for messages with duplicate message IDs).

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:30:37 +0000 (05:30 -0700)]

add_message: Re-order the code a bit (find message-id first).

We're preparing for being able to deal with files with duplicate
message IDs here. The plan is to create a notmuch_message_t object in
add_message that may or may not reference a document that exists in
the database. So to do this, we have to find the message ID before we
do any manipulation of the doc.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:25:58 +0000 (05:25 -0700)]

Move thread_id generation code from database.cc to message.cc

It's really up to the message to decide how to generate these.

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:18:35 +0000 (05:18 -0700)]

Move the _notmuch_message_sync from private to public interfaces

The idea here is to allow internal users to see a non-synced message
object, (for example, while parsing a message file and incrementally
adding terms, etc.). We're willing to take the care to get the
improved performance.

But for the public interface, keeping everything synced will be much
less confusing, (reference lots of sup bugs that happen due to
message state being altered by the user but not synced to the database).

commit | commitdiff | tree

Carl Worth [Fri, 23 Oct 2009 12:13:42 +0000 (05:13 -0700)]

add_message: Rename message to message_file

I still don't like the name message_file at all, but we're about
to start using a notmuch_message_t in this function so we need
to do something to keep the identifiers separate for now.

Eventually, it probably makes sense to push the message-parsing
code from database.cc to message.cc.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 22:46:22 +0000 (15:46 -0700)]

Prevent that last bug from reoccurring.

It's even enough to check if a "missing" header was accidentally
left off the list in the call to restrict_headers. (And it's
cheap since we only check in case no such header was found in the
message.)

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 22:34:47 +0000 (15:34 -0700)]

Don't forget the "to" header when restrict parsing to certain headers

We recently started discarding files as "not email" if they have none
of Subject, From, nor To. Apaprently, my mail collection contains a
number of messages that I sent, that are saved without Subject and
From, (perhaps these were drafts?).

Anyway, it's fortunate I had those since they alerted me to this bug,
where we were not parsing the "To" header in some cases.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 22:33:56 +0000 (15:33 -0700)]

Fix missing error check.

The notmuch_message_file_open function is perfectly capable of
returning NULL. So check for it.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 22:31:56 +0000 (15:31 -0700)]

Generate message ID (using SHA1) when a mail message contains none.

This is important as we're using the message ID as the unique key
in our database. So previously, all messages with no message ID
would be treated as the same message---not good at all.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 06:25:58 +0000 (23:25 -0700)]

Rename sha1.c to libsha1.c

This way both the .c and .h files have the same name, and all of the
code imported from the "libsha1" implementation is in filenames
matching libsha1.*.

This also gives me room to make my own notmuch_sha1 wrapper functions
in sha1.c.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 06:23:32 +0000 (23:23 -0700)]

Merge branch from fixing up bugs after bisecting.

I'm glad that when I implemented "notmuch restore" I went through the
extra effort to take the code I had written in one sitting into over a
dozen commits. Sure enough, I hadn't tested well enough and had
totally broken "notmuch setup", (segfaults and bogus thread_id
values).

With the little commits I had made, git bisect saved the day, and I
went back to make the fixes right on top of the commits that
introduced the bugs. So now we octopus merge those in.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 06:10:19 +0000 (23:10 -0700)]

Bring back the insert_thread_id function.

We deleted this in favor of our fancy new thread_ids iterator
from the message object. But one of the previous callers of
insert_thread_id isn't using notmuch_message_t yet. I made
the mistake of thinking I could just call g_hash_table_insert
directly, but the problem was that nobody was splitting
up the thread_id string at its commas.

So with this, we were inserting bogus comma-separated IDs
into the hash table, so thread_id values were ballooning
out of control. Should be much better now.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 06:01:17 +0000 (23:01 -0700)]

Fix lifetime-maintenance bug with std::string and c_str()

Here's more evidence that C++ is a nightmare to program---or that
I'm smart enough to realize that C++ is more clever than I will
ever be.

Most of my issues with C++ have to do with it hiding things from
me that I'd really like to and expect to be aware of as a C
programmer.

For example, the specific problem here is that there's a
short-lived std::string, from which I just want to copy
the C string. I try to do that on the next line, but before
I can, C++ has already called the destructor on the std::string.

Now, C++ isn't alone in doing garbage collecting like this.
But in a *real* garbage-collecting system, everything would
work that way. For example, here, I'm still holding a pointer
to the C string contents, so if the garbage collector were
aware of that reference, then it might clean up the std::string
container and leave the data I'm still using.

But that's not what we get with C++. Instead, some things are
reference counted and collected, (like the std::string), and
some things just aren't (like the C string it contains). The
end result is that it's very fragile. It forces me to be aware
of the timing of hidden functions. In a "real" system I wouldn't
have to be aware of that timing, and in C the function just
wouldn't be hidden.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 04:29:18 +0000 (21:29 -0700)]

List a few more co-conspirators.

Keith's name already shows up in the git log, so it would be
wrong to not mention him. And Martin and Jamey have been
helpful in discussions about what an ideal mail system
would look like.

commit | commitdiff | tree

Carl Worth [Thu, 22 Oct 2009 04:26:01 +0000 (21:26 -0700)]

Add an AUTHORS file.

Now that I've copied in another source file from someone else, I
want to be sure I'm keeping a good list of everyone who has helped.

commit | commitdiff | tree

Mikhail Gusarov [Thu, 22 Oct 2009 04:07:43 +0000 (21:07 -0700)]

Add sha1.c and libsha1.h for doing SHA-1-based message-ID generation.

This code comes courtesy of Brian Gladman and Mikhail Gusarov.

Both files are available under the GPL and were downloaded as
version 0.2 of libsha1 from git://github.com/dottedmag/libsha1.git
with the following commit:

commit d0f0e7e0dc5ce2d58972cb5a492183c0d4e58433
Author: Mikhail Gusarov <dottedmag@dottedmag.net>
Date: Mon Oct 20 22:38:47 2008 +0700

Version bump.

Signed-off-by: Mikhail Gusarov <dottedmag@dottedmag.net>

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 23:25:08 +0000 (16:25 -0700)]

Add copy of GNU General Public License (version 3).

All the files were already advertising the license, but we didn't
actually have a copy of the license in the repository until now.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 23:12:53 +0000 (16:12 -0700)]

Add notmuch_status_to_string function.

Be kind and let the user print error messages, not just error
codes.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 23:03:03 +0000 (16:03 -0700)]

Implement "notmuch restore".

It's pretty easy to do with all the right infrastructure in place.
Now that I can get my tags from sup to notmuch, maybe I'll be able
to start reading mail again.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:59:11 +0000 (15:59 -0700)]

Pull out a chomp_newline function from "notmuch setup"

We'll want this same thing with "notmuch restore", (and really
anything using getline).

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:53:38 +0000 (15:53 -0700)]

Add notmuch_message_add_tag and notmuch_message_remove_tag

With these two added, we now have enough functionality in the
library to implement "notmuch restore".

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:51:13 +0000 (15:51 -0700)]

notmuch-private.h: Move NOTMUCH_BEGIN_DECLS earlier

We actually need this before the include of xutil.h, but
it was previously stuck randomly among various system
includes. Instead, put it at the top, right after include
the notmuch.h header that defines it.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:46:46 +0000 (15:46 -0700)]

notmuch_query_search: Clarify the documentation.

This is where we wanted to put the note to recommend the user
call notmuch_message_destroy if the lifetime of the message
is much shorter than the lifetime of the query. (Somehow this
had ended up in the documentation of notmuch_message_get_tags
before.)

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:45:34 +0000 (15:45 -0700)]

notmuch.h: Fix some copy-paste errors in the documentaton.

In several places we had "results" where "tags" was intended.
It actually read fine in some cases, but this is still better.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:42:54 +0000 (15:42 -0700)]

notmuch_message_get_message_id: Fix to cache result

Previously, this would allocate new memory with every call. That
was with talloc, of course, so there wasn't any leaking (eventually).
But since we're now calling this internally we want to be a little
less wasteful. It's easy enough to just stash the result into the
message on the first call, and then just return that on subsequent
calls.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:37:51 +0000 (15:37 -0700)]

database: Add new notmuch_database_find_message

With this function, and the recently added support for
notmuch_message_get_thread_ids, we now recode the find_thread_ids
function to work just the way we expect a user of the public
notmuch API to work. Not too bad really.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:23:08 +0000 (15:23 -0700)]

Add notmuch_message_get_thread_ids function

Along with all of the notmuch_thread_ids_t iterator functions.
Using a consistent idiom seems better here rather than returning
a comma-separated string and forcing the user to parse it.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 22:06:52 +0000 (15:06 -0700)]

Add wrappers for regcomp and regexec to xutil.c.

These will be handy for some parsing.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 21:10:00 +0000 (14:10 -0700)]

Rename NOTMUCH_MAX_TERM to NOTMUCH_TERM_MAX

Just better consistency with our naming schemes.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 21:07:40 +0000 (14:07 -0700)]

Move find_prefix function from database.cc to message.cc

It's definitely a better fit there for now, (and can likely
eventually be made static as add_term moves from database
to message as well).

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 21:02:51 +0000 (14:02 -0700)]

notmuch dump: Fix to print spaces between tags.

Simple little bug here made all the tags run together.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 21:00:37 +0000 (14:00 -0700)]

Convert notmuch_database_t to start using talloc.

This will be handy as we can hang future talloc allocations off
of the datbase now.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 20:57:02 +0000 (13:57 -0700)]

Move declarations for xutil.c from notmuch-private to new xutil.h.

The motivation here is that our top-level notmuch.c main program
wants to start using these, but we don't want it to see into
notmuch-private.h, (since our main program is a test vehicle
for the "public" notmuch interface in notmuch.h).

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 17:12:11 +0000 (10:12 -0700)]

notmuch dump: Fix buffer overrun in error message.

Just a little bug I noticed while editing nearby code.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 17:07:34 +0000 (10:07 -0700)]

notmuch setup: Collapse internal whitespace within message-id

I'm too lazy to see what the RFC says, but I know that having
whitespace inside a message-ID is sure to confuse things. And
besides, this makes things more compatible with sup so that
I have some hope of importing sup labels.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 07:35:56 +0000 (00:35 -0700)]

notmuch dump: Fix the sorting of results.

To properly support sorting in notmuch_query we know use an
Enquire object. We also throw in a QueryParser too, so we're
really close to being able to support arbitrary full-text
searches.

I took a look at the supported QueryParser syntax and chose
a set of flags for everything I like, (such as supporting
Boolean operators in either case ("AND" or "and"), supporting
phrase searching, supporting + and - to include/preclude terms,
and supporting a trailing * on any term as a wildcard).

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 07:34:36 +0000 (00:34 -0700)]

add_message: Add a type:mail ("Kmail") term to all documents.

This gives us an easy way to specify "all mail messages" in a search
query. We simply look for this term.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 07:32:30 +0000 (00:32 -0700)]

notmuch setup: Print a few protecting spaces after progress reports.

This is to help keep the report looking clean when a new report
is shorter than a previous reports, (say, when crossing the
boundary from over one minute remaining to less than one minute
remaining).

This used to be here, but I must have accidentally dropped it
when reformatting the progress report recently.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 06:13:28 +0000 (23:13 -0700)]

.gitignore: Ignore generated file Makefile.dep

Forgot to add this when I first add dependency checking to the
Makefile.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 06:12:53 +0000 (23:12 -0700)]

database: Remove two little bits of dead code.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 05:40:37 +0000 (22:40 -0700)]

query: Remove the magic NOTMUCH_QUERY_ALL

Using the address of a static char* was clever, but really
unnecessary. An empty string is much less magic, and even
easier to understand as the way to query everything from
the database.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 05:27:56 +0000 (22:27 -0700)]

notmuch dump: Free each message as it's used.

Previously we were leaking[*] memory in that the memory footprint of
a "notmuch dump" run would continue to grow until the output was
complete, and then finally all the memory would be freed.

Now, the memory footprint is small and constant, O(1) rather than
O(n) in the number of messages.

[*] Not leaking in a valgrind sense---every byte was still carefully
being accounted for and freed eventually.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 05:24:59 +0000 (22:24 -0700)]

Add destroy functions for results, message, and tags.

None of these are strictly necessary, (everything was leak-free
without them), but notmuch_message_destroy can actually be useful
for when one query has many message results, but only one is needed
to be live at a time.

The destroy functions for results and tags are fairly gratuitous, as
there's unlikely to be any benefit from calling them. But they're all
easy to add, (all of these functions are just wrappers for talloc_free),
and we do so for consistency and completeness.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 05:08:31 +0000 (22:08 -0700)]

Rename our talloc destructor functions to _destructor.

I want to reserve the _destroy names for some public functions
I'm about to add.

commit | commitdiff | tree

Carl Worth [Wed, 21 Oct 2009 04:03:30 +0000 (21:03 -0700)]

Implement 'notmuch dump'.

This is a fairly big milestone for notmuch. It's our first command
to do anything besides building the index, so it proves we can
actually read valid results out from the index.

It also puts in place almost all of the API and infrastructure we
will need to allow searching of the database.

Finally, with this change we are now using talloc inside of notmuch
which is truly a delight to use. And now that I figured out how
to use C++ objects with talloc allocation, (it requires grotty
parts of C++ such as "placement new" and "explicit destructors"),
we are valgrind-clean for "notmuch dump", (as in "no leaks are
possible").

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 22:09:51 +0000 (15:09 -0700)]

Rename private notmuch_message_t to notmuch_message_file_t

This is in preparation for a new, public notmuch_message_t.

Eventually, the public notmuch_message_t is going to grow enough
features to need to be file-backed and will likely need everything
that's now in message-file.c. So we may fold these back into one
object/implementation in the future.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 22:08:03 +0000 (15:08 -0700)]

Makefile: Add automatic dependency tracking to the Makefile.

With this, I really don't miss anything from automake.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 20:16:16 +0000 (13:16 -0700)]

notmuch: Fix setup so that accepting the default mail path works.

The recent change from GIOChannel to getline, (with a semantic
change of the newline terminator now being included in the
result that setup_command sees), broke this.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 20:07:19 +0000 (13:07 -0700)]

message: Use g_hash_table_destroy instead of g_hash_table_unref

I'm trying to chase down 3 still-reachable pointers to glib hash
tables.

This change didn't help with that, but I think destroy might be a
better semantic match for what I actually want. (It shouldn't matter
though since I never take any additional references.)

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 20:05:45 +0000 (13:05 -0700)]

add_message: Fix memory leak of thread_ids GPtrArray.

We were properly feeing this memory when the thread-ids list was not
empty, but leaking it when it was.

Thanks, of course, to valgrind along with the G_SLICE=always-malloc
environment variable which makes leak checking with glib almost
bearable.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 19:49:32 +0000 (12:49 -0700)]

database.cc: Document better pieces of glib that we're using.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 19:48:14 +0000 (12:48 -0700)]

message.c: Free leaked memory in notmuch_message object

We were careful to free this memory when we finished parsing the
headers, but we missed it for the case of closing the message
without ever parsing all of the headers.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 19:47:23 +0000 (12:47 -0700)]

notmuch: Use GNU libc getline() instead of glib GIOChannel

Less reliance on glib is always nice for our memory-leak testing
efforts.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 17:14:00 +0000 (10:14 -0700)]

notmuch_database_open: Fix error message for file-not-found.

I was incorrectly using the return value of stat (-1) instead of
errno (ENOENT) to try to construct the error message here.

Also, while we're here, reword the error message to not have
"stat" in it, which in spite of what a Unix programmer will
tell you, is not actually a word.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 17:07:11 +0000 (10:07 -0700)]

Add some explanation about NOTMUCH_BASE to setup_command.

Since we allow the user to enter a custom directory, we need to
let the user know how to make this persistent. Of course, a better
answer would be to take what the user entered and shove it into
a ~/.notmuch-config file or so, but for now this will have to do.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 16:56:25 +0000 (09:56 -0700)]

notmuch_database_create/open: Fix to handle NULL as documented.

When documenting these functions I described support for a
NOTMUCH_BASE environment variable to be consulted in the case
of a NULL path. Only, I had forgotten to actually write the
code.

This code exists now, with a new, exported function:

notmuch_database_default_path

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 16:52:01 +0000 (09:52 -0700)]

notmuch_message_get_header: Fix bogus return of NULL header.

A simple bug meant that the correct value was being inserted into
the hash table, but a NULL value would be returned in some cases.
(If the value was already in the hash table at the beginning of
the call the the correct value would be returned, but if the
function had to parse to reach it then it would return NULL.)

This was tripping up the recently-added code to ignore messages
with NULL From:, Subject:, and To: headers, (which is fortunate
since otherwise the broken parsing might have stayed hidden for
longer).

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 06:41:31 +0000 (23:41 -0700)]

notmuch: Revamp help message a bit.

The big update here is the addition of the dump and restore commands
which are next on my list. Also, I've now come up with a syntax for
documenting the arguments of sub-commands.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 06:08:49 +0000 (23:08 -0700)]

notmuch: Ignore files that don't look like email messages.

This is helpful for things like indexes that other mail programs
may have left around. It also means we can make the initial
instructions much easier, (the user need not worry about moving
away auxiliary files from some other email program).

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 05:34:59 +0000 (22:34 -0700)]

Protect definition of _GNU_SOURCE.

I was getting a duplicate definition of this from somewhere, so
getting compiler warnings without this protection.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 05:24:28 +0000 (22:24 -0700)]

Remove test programs, xapian-dump and notmuch-index-message

These were just little tests while getting comfortable with
GMime and xapian. I'll likely use pieces of these as notmuch
continues, but for now let's not distract anyone looking
at notmuch with these.

And the code will live on in the history if I need to look
at it.

commit | commitdiff | tree

Carl Worth [Tue, 20 Oct 2009 01:30:48 +0000 (18:30 -0700)]

notmuch: Reword the progress report slightly.

I noticed this style during a recent Debian install and I liked
how much less busy it is compared to what we had before, (while
still telling the user everything she might want).

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 23:38:44 +0000 (16:38 -0700)]

Rework message parsing to use getline rather than mmap.

The line-based parsing can be a bit awkward when wanting to peek
ahead, (say, for folded header values), but it's so convenient
to be able to trust that a string terminator exists on every
line so it cleans up the code considerably.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:48:13 +0000 (13:48 -0700)]

Don't hash headers we won't end up using.

Just saving a little work here.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:40:56 +0000 (13:40 -0700)]

Document which pieces of glib we're still using.

Looks like we can copy in a hash-table implementation, (from cairo,
say), and then a few _ascii_ functions from glib, (we'll need to
switch a few current uses if things like isspace, etc. to locale-
independent versions as well). So not too hard to free ourselves
of glib for now, (until we add GMime back in later, of course).

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:35:29 +0000 (13:35 -0700)]

Hook up our fancy new notmuch_parse_date function.

With all the de-glib-ification out of the way, we can now use it
to allow for date-based sorting of Xapian search results.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:24:12 +0000 (13:24 -0700)]

notmuch_parse_date: Handle a NULL date string gracefully.

The obvious thing to do is to treat a missing date as the beginning
of time. Also, remove a useless cast from another return of 0.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:21:58 +0000 (13:21 -0700)]

date.c: Rename function to notmuch_parse_date

Now completing the process of making this function "our own".

The documentation is deleted here, because we already have
the documentation we want in notmuch-private.h.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:19:37 +0000 (13:19 -0700)]

date.c: Add hard-coded definition of HAVE_TIMEZONE

The original code expected this to be set by running configure.
We'll just manually set it here for now. This isn't as portable
as if we were doing some compile-time examination of the current
system, but I don't need portability now.

When someone comes along that wants to port notmuch to another
system, they will already have all the #ifdefs in place and
will simply need to add the appropriate machinery to set the
defines.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:14:37 +0000 (13:14 -0700)]

date.c: Don't use glib's slice allocator.

This change is gratuitous. For now, notmuch is still linking
against glib, so I don't have any requirement to remove this,
(unlike the last few changes where good taste really did
require the changes).

The motivation here is two-fold:

1. I'm considering switching away from all glib-based allocation
soon so that I can more easily verify that the memory management
is solid. I want valgrind to say "no leaks are possible" not
"there is tons of memory still allocated, but probably reachable
so who knows if there are leaks or not?". And glib seems to make
that impossible.

2. I don't think there's anything performance-sensitive about the
allocation here. (In fact, if there is, then the right answer
would be to do this parsing without any allocation whatsoever.)

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:11:57 +0000 (13:11 -0700)]

date.c: Remove occurrences of gboolean.

While this is surely one of the most innocent typedefs, it still
annoys me to have basic types like 'int' re-defined like this.
It just makes it harder to copy the code between projects, with
very little benefit in readability.

For readability, predicate functions and variables should be
obviously Boolean-natured by their actual *names*.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:09:19 +0000 (13:09 -0700)]

date.c: Remove all occurrences of g_return_val_if_fail

That's got to be one of the hardest macro names to read, ever,
(it's phrased with an implicit negative in the condition,
rather than something simple like "assert").

Plus, it's evil, since it's a macro with a return in it.

And finally, it's actually *longer* than just typing "if"
and "return". So what's the point of this ugly idiom?

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:07:58 +0000 (13:07 -0700)]

date.c: Keep the comments clean.

Never know when the children might be reading over my shoulder,
for example. :-)

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:06:55 +0000 (13:06 -0700)]

date.c: Change headers/defines t owork within notmuch.

We can't rely on any gmime-internal headers, (and fortunately we
don't need to). We also aren't burdened with any autconf machinery
so don't reference any of that.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:04:59 +0000 (13:04 -0700)]

date.c: Remove a bunch of undesired code.

We're only interested in the date-parsing code here.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 20:02:17 +0000 (13:02 -0700)]

date.c: Convert from LGPL-2+ to GPL-3+

As authorized by LGPL-2 term (3).

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 19:57:38 +0000 (12:57 -0700)]

date.c: Add new file directly from gmime2.4-2.4.6/gmime/gmime-utils.c

We're sucking in one gmime implementation file just to get the
piece that parses an RFC 822 date, because I don't want to go
through the pain of replicating that.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 19:54:40 +0000 (12:54 -0700)]

notmuch: Switch from gmime to custom, ad-hoc parsing of headers.

Since we're currently just trying to stitch together In-Reply-To
and References headers we don't need that much sophistication.
It's when we later add full-text searching that GMime will be
useful.

So for now, even though my own code here is surely very buggy
compared to GMime it's also a lot faster. And speed is what
we're after for the initial index creation.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 19:52:46 +0000 (12:52 -0700)]

notmuch: Ignore .notmuch when counting files.

We were correctly ignoring this when adding files, but not when
doing the initial count. Clearly we need better code sharing
here.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 03:56:30 +0000 (20:56 -0700)]

notmuch: Start actually adding messages to the index.

This is the beginning of the notmuch library as well, with its
interface in notmuch.h. So far we've got create, open, close, and
add_message (all with a notmuch_database prefix).

The current add_message function has already been whittled down from
what we have in notmuch-index-message to add only references,
message-id, and thread-id to the index, (that is---just enough to do
thread-linkage but nothing for full-text searching).

The concept here is to do something quickly so that the user can get
some data into notmuch and start using it. (The most interesting stuff
is then thread-linkage and labels like inbox and unread.) We can
defer the full-text indexing of the body of the messages for later,
(such as in the background while the user is reading mail).

The initial thread-stitching step is still slower than I would like.
We may have to stop using libgmime for this step as its overhead is
not worth it for the simple case of just parsing the message-id,
references, and in-reply-to headers.

commit | commitdiff | tree

Carl Worth [Mon, 19 Oct 2009 03:49:43 +0000 (20:49 -0700)]

xapian-dump: Rewrite to generate C code as output.

This was for some time testing, (to see how fast xapian could be
if we were strictly adding documents and not doing any other IO
or computation). The answer is that xapian is quite fast, (on
the order of 1000 documents per second).

commit | commitdiff | tree

Carl Worth [Sat, 17 Oct 2009 15:26:58 +0000 (08:26 -0700)]

Start a new top-level executable: notmuch.

Of course, there's not much that this program does yet. It's got
some structure for some sub-commands that don't do anything. And
it has a main command that prints some explanatory text and then
counts all the regular files in your mail archive.

commit | commitdiff | tree

Carl Worth [Fri, 16 Oct 2009 20:42:42 +0000 (13:42 -0700)]

Fix more memory leaks.

These were more significant than the previous leak because these were
in the loop and leaking memory for every message being parsed. It
turns out that g_hash_table_new should probably be named
g_hash_table_new_and_leak_memory_please. The actually useful function
is g_hash_table_new_full which lets us pass a free function, (to free
keys when inserting duplicates into the hash table). And after all,
weeding out duplicates is the only reason we are using this hash table
in the first place.

It almost goes without saying, valgrind found these leaks.

commit | commitdiff | tree

Carl Worth [Fri, 16 Oct 2009 20:41:37 +0000 (13:41 -0700)]

Fix a one-time memory leak.

This was a single object in main outside any loops, so there was
no impact on performance or anything, but obviously we still want
to patch this.

Of course, valgrind gets the credit for seeing this.

commit | commitdiff | tree

Carl Worth [Fri, 16 Oct 2009 20:38:43 +0000 (13:38 -0700)]

Avoid reading a byte just before our allocated buffer.

When looking for a trailing ':' to introduce a quotation we peek at
the last character before a newline. But for blank lines, that's not
where we want to look. And when the first line in our buffer is a
blank line, we're underrunning our buffer. The fix is easy---just
bail early on blank lines since they have no terms anyway.

Thanks to valgrind for pointing out this error.

commit | commitdiff | tree

Carl Worth [Fri, 16 Oct 2009 20:33:39 +0000 (13:33 -0700)]

Generate random thread IDs instead of using an arbitrary Message-ID.

Previously, we used as the thread-id the message-id of the first
message in the thread that we happened to find. In fact, this is a
totally arbitrary identifier, so it might as well be random. And an
advantage of actually using a random identifier is that we now have
fixed-length thead identifiers, (and the way is open to even allow
abbreviated identifiers like git does---though we're less likely to
show these identifiers to actual users).

commit | commitdiff | tree

Carl Worth [Thu, 15 Oct 2009 16:04:31 +0000 (09:04 -0700)]

Change progress report to show "instantaneous" rate. Also print total time.

Instead of always showing the overall rate, we wait until the end
to show that. Then, on incremental updates we show the rate over the
last increment. This makes it much easier to actually watch what's
happening, (and it's easy to see the efect of xapian's internal
10,000 document flush).

commit | commitdiff | tree

Keith Packard [Thu, 15 Oct 2009 04:46:54 +0000 (21:46 -0700)]

Protect against missing message id while indexing files

commit | commitdiff | tree

Keith Packard [Thu, 15 Oct 2009 04:17:39 +0000 (21:17 -0700)]

Walk address groups and parse each address separately

Signed-off-by: Keith Packard <keithp@keithp.com>

commit | commitdiff | tree

Carl Worth [Thu, 15 Oct 2009 00:26:28 +0000 (17:26 -0700)]

Reduce the verbosity of the progress indicator.

It's fast enough that we can wait for 1000 messages before updating.

commit | commitdiff | tree

Carl Worth [Thu, 15 Oct 2009 00:25:20 +0000 (17:25 -0700)]

Add support for message-part mime parts.

We could (and probably should) reparse and index all the headers from
the embedded message, but I'm not choosing to do that now---I'm just
indexing the body of the embedded message.

commit | commitdiff | tree

Carl Worth [Thu, 15 Oct 2009 00:24:28 +0000 (17:24 -0700)]

Avoid segfault on message with no subject.

It's fun how turning a program loose on 500,000 messages will find
lots of littel corner cases.

commit | commitdiff | tree

Carl Worth [Thu, 15 Oct 2009 00:10:14 +0000 (17:10 -0700)]

Add some sort of progress indicator.

It's nice to let the user know that something is happening.

Thread-based e-mail indexer, supporting quick search and tagging