indexer: make lazy check even lazier
In lazy-mode, we were skipping directories that did not change; however,
this didn't help for the case were users received new messages in big
maildirs.
So, add another check where we compare the ctime of message files with
the time of the last indexing operation. If it's smaller, ignore the
message-file. This is faster than having to consult the Xapian database
for each message.
Note that this requires in mu4e:
(setq mu4e-index-lazy-check t)
or
--lazy-check
as a parameter for 'mu index'.
This commit is contained in:
@ -145,6 +145,8 @@ struct Indexer::Private {
|
|||||||
std::mutex lock_, w_lock_;
|
std::mutex lock_, w_lock_;
|
||||||
std::atomic<time_t> completed_{};
|
std::atomic<time_t> completed_{};
|
||||||
bool was_empty_{};
|
bool was_empty_{};
|
||||||
|
|
||||||
|
uint64_t last_index_{};
|
||||||
};
|
};
|
||||||
|
|
||||||
bool
|
bool
|
||||||
@ -206,12 +208,16 @@ Indexer::Private::handler(const std::string& fullpath, struct stat* statbuf,
|
|||||||
|
|
||||||
case Scanner::HandleType::File: {
|
case Scanner::HandleType::File: {
|
||||||
++progress_.checked;
|
++progress_.checked;
|
||||||
|
if (conf_.lazy_check && static_cast<uint64_t>(statbuf->st_ctime) < last_index_) {
|
||||||
if ((size_t)statbuf->st_size > max_message_size_) {
|
// in lazy mode, ignore the file if it has not changed
|
||||||
mu_debug("skip {} (too big: {} bytes)", fullpath, statbuf->st_size);
|
// since the last indexing op.
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (static_cast<size_t>(statbuf->st_size) > max_message_size_) {
|
||||||
|
mu_debug("skip {} (too big: {} bytes)", fullpath, statbuf->st_size);
|
||||||
|
return false;
|
||||||
|
}
|
||||||
// if the message is not in the db yet, or not up-to-date, queue
|
// if the message is not in the db yet, or not up-to-date, queue
|
||||||
// it for updating/inserting.
|
// it for updating/inserting.
|
||||||
if (statbuf->st_ctime <= dirstamp_ && store_.contains_message(fullpath))
|
if (statbuf->st_ctime <= dirstamp_ && store_.contains_message(fullpath))
|
||||||
@ -414,6 +420,10 @@ Indexer::Private::start(const Indexer::Config& conf, bool block)
|
|||||||
mu_debug("indexing: {}; clean-up: {}", conf_.scan ? "yes" : "no",
|
mu_debug("indexing: {}; clean-up: {}", conf_.scan ? "yes" : "no",
|
||||||
conf_.cleanup ? "yes" : "no");
|
conf_.cleanup ? "yes" : "no");
|
||||||
|
|
||||||
|
// remember the _previous_ indexing, so in lazy mode we can skip
|
||||||
|
// those files.
|
||||||
|
last_index_ = store_.config().get<Mu::Config::Id::LastIndex>();
|
||||||
|
|
||||||
state_.change_to(IndexState::Scanning);
|
state_.change_to(IndexState::Scanning);
|
||||||
/* kick off the first worker, which will spawn more if needed. */
|
/* kick off the first worker, which will spawn more if needed. */
|
||||||
workers_.emplace_back(std::thread([this] { item_worker(); }));
|
workers_.emplace_back(std::thread([this] { item_worker(); }));
|
||||||
|
|||||||
@ -54,8 +54,8 @@ public:
|
|||||||
bool ignore_noupdate{};
|
bool ignore_noupdate{};
|
||||||
/**< ignore .noupdate files */
|
/**< ignore .noupdate files */
|
||||||
bool lazy_check{};
|
bool lazy_check{};
|
||||||
/**< whether to skip directories that don't have a changed
|
/**< whether to skip directories or message files that haven't changed since the
|
||||||
* mtime */
|
* previous indexing operation, based on their ctime */
|
||||||
};
|
};
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
|||||||
@ -41,7 +41,7 @@ If there is a file called _.noupdate_ in a directory, the contents of that
|
|||||||
directory and all of its subdirectories will be ignored. This can be useful to
|
directory and all of its subdirectories will be ignored. This can be useful to
|
||||||
speed up things you have some maildirs that never change.
|
speed up things you have some maildirs that never change.
|
||||||
|
|
||||||
_.noupdate_ does not affect already-indexed message: you can still search for
|
_.noupdate_ does not affect already-indexed messages: you can still search for
|
||||||
them. _.noupdate_ is ignored when you start indexing with an empty database (such
|
them. _.noupdate_ is ignored when you start indexing with an empty database (such
|
||||||
as directly after *mu init*).
|
as directly after *mu init*).
|
||||||
|
|
||||||
@ -58,7 +58,7 @@ the database for which there is no longer a corresponding file in the Maildir.
|
|||||||
If you do not want this, you can use *-n*, *--nocleanup*.
|
If you do not want this, you can use *-n*, *--nocleanup*.
|
||||||
|
|
||||||
When *mu index* catches one of the signals *SIGINT*, *SIGHUP* or *SIGTERM* (e.g., when
|
When *mu index* catches one of the signals *SIGINT*, *SIGHUP* or *SIGTERM* (e.g., when
|
||||||
you press Ctrl-C during the indexing process), it attempts to shutdown
|
you press *Ctrl-C* during the indexing process), it attempts to shutdown
|
||||||
gracefully; it tries to save and commit data, and close the database etc. If it
|
gracefully; it tries to save and commit data, and close the database etc. If it
|
||||||
receives another signal (e.g., when pressing Ctrl-C once more), *mu index* will
|
receives another signal (e.g., when pressing Ctrl-C once more), *mu index* will
|
||||||
terminate immediately.
|
terminate immediately.
|
||||||
@ -67,12 +67,17 @@ terminate immediately.
|
|||||||
|
|
||||||
** --lazy-check
|
** --lazy-check
|
||||||
In lazy-check mode, *mu* does not consider messages for which the time-stamp
|
In lazy-check mode, *mu* does not consider messages for which the time-stamp
|
||||||
(ctime) of the directory they reside in has not changed since the previous
|
(*ctime*) of the directory in which they reside, has not changed since the
|
||||||
indexing run. This is much faster than the non-lazy check, but won't update
|
previous time this directory was checked.
|
||||||
messages that have change (rather than having been added or removed), since
|
|
||||||
merely editing a message does not update the directory time-stamp. Of course,
|
This is much faster than the non-lazy check, but won't update messages that have
|
||||||
you can run *mu-index* occasionally without *--lazy-check*, to pick up such
|
changed (rather than having been added or removed), since merely editing a
|
||||||
messages.
|
message does not update the directory time-stamp. Of course, you can run
|
||||||
|
*mu-index* occasionally without *--lazy-check*, to pick up such messages.
|
||||||
|
|
||||||
|
Furthermore, in lazy-check mode, files which have a *ctime* smaller than the time
|
||||||
|
the previous indexing operation was completed, are ignored. This helps for the
|
||||||
|
use-case where new messages can appear in big maildirs.
|
||||||
|
|
||||||
** --nocleanup
|
** --nocleanup
|
||||||
Disable the database cleanup that *mu* does by default after indexing.
|
Disable the database cleanup that *mu* does by default after indexing.
|
||||||
@ -185,7 +190,7 @@ ok 1 /bench/indexer/4-cores
|
|||||||
#+end_example
|
#+end_example
|
||||||
|
|
||||||
Things are again a little faster, even though the index does a lot more now
|
Things are again a little faster, even though the index does a lot more now
|
||||||
(text-normalizatian, and pre-generating message-sexps). A faster machine helps,
|
(text-normalization, and pre-generating message-sexps). A faster machine helps,
|
||||||
too!
|
too!
|
||||||
|
|
||||||
** recent releases
|
** recent releases
|
||||||
|
|||||||
Reference in New Issue
Block a user