This change makes index cleanup ~4x faster by changing how we
determine whether a file mentioned by the database still exists on
disk. Previously, we'd call access(2) for each file the database
mentioned. Doing so produced a lot of system call overhead. Now, we
read the directory entries of the directories containing the files
whose existence we're checking, build a hash table from what we find,
then do the existence check against this hash table instead of
entering the kernel.
The semantics of the cleanup check do change subtly, however.
Previously, we checked whether the mentioned file was *readable*.
Now we check merely that it exists. Extant but unreadable files in
maildirs should be rare.
BEFORE:
$ time mu index --lazy-check
lazily indexing maildir /home/dancol/Mail -> store /home/dancol/.cache/mu/xapian
/ indexing messages; checked: 0; updated/new: 0; cleaned-up: 0
real 0m19.310s
user 0m1.803s
sys 0m12.999s
AFTER:
$ time mu --debug index --lazy-check
lazily indexing maildir /home/dancol/Mail -> store /home/dancol/.cache/mu/xapian
- indexing messages; checked: 0; updated/new: 0; cleaned-up: 0
real 0m4.584s
user 0m2.433s
sys 0m2.133s
Some message can have an _empty_ message-id, e.g. with:
In-Reply-To: <>
which we weren't filter out.
This would yield and _empty_ Thread-Id, in mu-message.cc
And this would make mu-query believe it had no matches in the first
query, in Query::Private::run_related, and effectively throw away the
results. (Xapian using empty string both for a "not found" result, and
"found an empty string doesn't help either).
So, avoid having an empty reference. Also add a unit-test.
Fixes#2812.
Basically, make the "mu find .... --analyze" information available in
mu4e, through a function mu4e-server-last-query.
This is shows the query as the server saw it, as well as the parse
s-expressions. This can be useful to see how some query is interpreted.
The json output (for mu-find etc.) just showed the converted sexp
output, including the clumsy emacs-style tstamps (for changed/date).
Add unix timestamps as well, which are easier to work with outside
emacs.
This handles #2770.
Single-threaded is the build-default, and seems to work well enough for
1.12.7, so remove the option to turn it off.
This is because build-options that influence such low-level/core
behavior are a pain to maintain.
In lazy-mode, we were skipping directories that did not change; however,
this didn't help for the case were users received new messages in big
maildirs.
So, add another check where we compare the ctime of message files with
the time of the last indexing operation. If it's smaller, ignore the
message-file. This is faster than having to consult the Xapian database
for each message.
Note that this requires in mu4e:
(setq mu4e-index-lazy-check t)
or
--lazy-check
as a parameter for 'mu index'.
Try to avoid multi-threaded operations with Xapian.
This remove the thread workers during indexing, and avoids the indexing
background thread. So, mu4e has to wait once again during indexing.
We can improve upon that, but first we need to know if it avoids the
problem of issue #2756.
It's better to _not_ have auto-saves for your draft directory, but if
you do, ignore them at least in mu.
It may still trip up mbsync and friends, but not much we can do about
that.
Clean up the implementation a bit as well.
In Mu::parse_date_time, when provided with an empty string, return
time_t_max instead of G_MAXINT64. For systems with a 64-bit time_t, there
is no difference. With a 32-bit time_t it caused a test to fail:
not ok /utils/date-basic - ERROR:../mu-1.12.4/lib/utils/tests/test-utils.cc:92
void test_date_basic(): assertion failed
(parse_date_time(std::get<0>(test), std::get<1>(test)).value_or(-1)
== std::get<2>(test)): (18446744073709551615 == 2147483647)
This edge case probably only affected the test, as when other parts of
the application call parse_date_time (e.g. mu-server.cc and
mu-query-processor.cc), they check if the input string is empty first.
Add store::consume_message, which is like add message but std::move from
the caller such that the messages longer has copies (with
Xapian::Document) on the caller side; this is to avoid threading issues.
Seems journal logging fails on NetBSD (no surprise), but has some
unwanted/not-fully-understood side-effects.
In any case, outside Linux there's no use in even trying to use
journald; so we don't do that anymore.
Add conditional support for syslog (requires glib 2.80).
According to the readdir(2) man-page, not all file-systems support returning the
entry's file-type in `d_type`. For example, the reprotest reproducibility tool,
uses the disorderfs FUSE file-system to shuffle the order in which directory
entries are returned, and this does not set `d_type`. Therefore, in addition to
entries with type `DT_DIR` and `DT_LNK`, also process entries with type
`DT_UNKNOWN`.
Signed-off-by: Jeremy Sowden <azazel@debian.org>
Only include xapian.h in one place, so we can have consistent options.
With that in place, we can enable C++ move semantics.
We don't do anything with that yet, but we check in the meson.build file
to see if we have the required xapian version.