This change makes index cleanup ~4x faster by changing how we
determine whether a file mentioned by the database still exists on
disk. Previously, we'd call access(2) for each file the database
mentioned. Doing so produced a lot of system call overhead. Now, we
read the directory entries of the directories containing the files
whose existence we're checking, build a hash table from what we find,
then do the existence check against this hash table instead of
entering the kernel.
The semantics of the cleanup check do change subtly, however.
Previously, we checked whether the mentioned file was *readable*.
Now we check merely that it exists. Extant but unreadable files in
maildirs should be rare.
BEFORE:
$ time mu index --lazy-check
lazily indexing maildir /home/dancol/Mail -> store /home/dancol/.cache/mu/xapian
/ indexing messages; checked: 0; updated/new: 0; cleaned-up: 0
real 0m19.310s
user 0m1.803s
sys 0m12.999s
AFTER:
$ time mu --debug index --lazy-check
lazily indexing maildir /home/dancol/Mail -> store /home/dancol/.cache/mu/xapian
- indexing messages; checked: 0; updated/new: 0; cleaned-up: 0
real 0m4.584s
user 0m2.433s
sys 0m2.133s
Single-threaded is the build-default, and seems to work well enough for
1.12.7, so remove the option to turn it off.
This is because build-options that influence such low-level/core
behavior are a pain to maintain.
In lazy-mode, we were skipping directories that did not change; however,
this didn't help for the case were users received new messages in big
maildirs.
So, add another check where we compare the ctime of message files with
the time of the last indexing operation. If it's smaller, ignore the
message-file. This is faster than having to consult the Xapian database
for each message.
Note that this requires in mu4e:
(setq mu4e-index-lazy-check t)
or
--lazy-check
as a parameter for 'mu index'.
Try to avoid multi-threaded operations with Xapian.
This remove the thread workers during indexing, and avoids the indexing
background thread. So, mu4e has to wait once again during indexing.
We can improve upon that, but first we need to know if it avoids the
problem of issue #2756.
Add store::consume_message, which is like add message but std::move from
the caller such that the messages longer has copies (with
Xapian::Document) on the caller side; this is to avoid threading issues.
Instead of handling transactions in the store, handle it in xapian-db.
Make the code a bit more natural / cleaner-out
Handle transaction automatically (with a batch-size) and add some RAII
Transaction object, which makes all database interaction transactable
for the duration. So, no more need for explicit parameters to
add_message while indexing.