Seems people are getting really big mails these days, so let's up the default (which is also what mu4e uses) to 500 Mb (which should be enough for everyone, always)
229 lines
8.5 KiB
Groff
229 lines
8.5 KiB
Groff
.TH MU-INDEX 1 "September 2013" "User Manuals"
|
|
|
|
.SH NAME
|
|
|
|
mu index \- index e-mail messages stored in Maildirs
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.B mu index [options]
|
|
|
|
.SH DESCRIPTION
|
|
|
|
\fBmu index\fR is the \fBmu\fR command for scanning the contents of Maildir
|
|
directories and storing the results in a Xapian database. The data can then be
|
|
queried using
|
|
.BR mu-find(1)
|
|
\.
|
|
|
|
.B index
|
|
understands Maildirs as defined by Daniel Bernstein for qmail(7). In addition,
|
|
it understands recursive Maildirs (Maildirs within Maildirs), Maildir++. It
|
|
can also deal with VFAT-based Maildirs which use '!' as the separators instead
|
|
of ':' as used by \fITinymail\fR/\fIModest\fR and some other e-mail programs.
|
|
|
|
E-mail messages which are not stored in something resembling a maildir
|
|
leaf-directory (\fIcur\fR and \fInew\fR) are ignored, as are the cache
|
|
directories for \fInotmuch\fR and \fIgnus\fR.
|
|
|
|
Symlinks are not followed.
|
|
|
|
If there is a file called \fI.noindex\fR in a directory, the contents of that
|
|
directory and all of its subdirectories will be ignored. This can be useful to
|
|
exclude certain directories from the indexing process, for example directories
|
|
with spam-messages.
|
|
|
|
If there is a file called \fI.noupdate\fR in a directory, the contents of that
|
|
directory and all of its subdirectories will be ignored, unless we do a full
|
|
rebuild (with \fB--rebuild\fR). This can be useful to speed up things you have
|
|
some maildirs that never change. Note that you can still search for these
|
|
messages, this only affects updating the database.
|
|
|
|
The first run of \fBmu index\fR may take a few minutes if you have a lot of
|
|
mail (tens of thousands of messages). Fortunately, such a full scan needs to be
|
|
done only once; after that it suffices to index the changes, which goes much
|
|
faster. See the 'Note on performance' below for more information.
|
|
|
|
The optional 'phase two' of the indexing-process is the removal of messages
|
|
from the database for which there is no longer a corresponding file in the
|
|
Maildir. If you do not want this, you can use \fB\-n\fR, \fB\-\-nocleanup\fR.
|
|
|
|
When \fBmu index\fR catches one of the signals \fBSIGINT\fR, \fBSIGHUP\fR or
|
|
\fBSIGTERM\fR (e.g., when you press Ctrl-C during the indexing process), it
|
|
tries to shutdown gracefully; it tries to save and commit data, and close the
|
|
database etc. If it receives another signal (e.g., when pressing Ctrl-C once
|
|
more), \fBmu index\fR will terminate immediately.
|
|
|
|
.SH OPTIONS
|
|
|
|
Note, some of the general options are described in the \fBmu(1)\fR man-page
|
|
and not here, as they apply to multiple mu commands.
|
|
|
|
.TP
|
|
\fB\-m\fR, \fB\-\-maildir\fR=\fI<maildir>\fR
|
|
starts searching at \fI<maildir>\fR. By default, \fBmu\fR uses whatever the
|
|
\fBMAILDIR\fR environment variable is set to; if it is not set, it tries
|
|
\fI~/Maildir\fR. See the note on mixing sub-maildirs below.
|
|
|
|
.TP
|
|
\fB\-\-my-address\fR=\fI<my-email-address>\fR
|
|
|
|
specifies that some e-mail address is 'my-address' (\fB\-\-my-address\fR can
|
|
be used multiple times). This is used by \fBmu cfind\fR -- any e-mail address
|
|
found in the address fields of a message which also has
|
|
\fI<my-email-address>\fR in one of its address fields is considered a
|
|
\fIpersonal\fR e-mail address. This allows you, for example, to filter out
|
|
(\fBmu cfind --personal\fR) addresses which were merely seen in mailing list
|
|
messages.
|
|
|
|
.TP
|
|
\fB\-\-nocleanup\fR
|
|
disables the database cleanup that \fBmu\fR does by default after indexing.
|
|
|
|
.TP
|
|
\fB\-\-rebuild\fR
|
|
clear all messages from the database before indexing. \fB\-\-rebuild\fR
|
|
guarantees that after the indexing has finished, there are no 'old' messages
|
|
in the database anymore, which is not true with \fB\-\-reindex\fR when
|
|
indexing only a part of messages (using \fB\-\-maildir\fR). For this reason,
|
|
it is necessary to run \fBmu index \-\-rebuild\fR when there is an upgrade in
|
|
the database format. \fBmu index\fR will issue a warning about this.
|
|
|
|
.TP
|
|
\fB\-\-autoupgrade\fR
|
|
automatically use \fB\-y\fR, \fB\-\-empty\fR
|
|
when \fBmu\fR notices that the database version is not up-to-date. This option
|
|
is for use in cron scripts and the like, so they won't require any user
|
|
interaction, even when mu introduces a new database version.
|
|
|
|
.TP
|
|
\fB\-\-xbatchsize\fR=\fI<batch size>\fR
|
|
set the maximum number of messages to process in a single Xapian
|
|
transaction. In practice, this option is only useful if you find that \fBmu\fR
|
|
is running out of memory while indexing; in that case, you can set the batch
|
|
size to (for example) 1000, which will reduce memory consumption, but also
|
|
substantially reduce the indexing performance.
|
|
|
|
.TP
|
|
\fB\-\-max-msg-size\fR=\fI<max msg size>\fR\
|
|
set the maximum size (in bytes) for messages. The default maximum
|
|
(currently at 500Mb) should be enough in most cases, but if you
|
|
encounter warnings from \fBmu\fR about ignoring messsage because they
|
|
are too big, you may want to increase this. Note that the reason for
|
|
having a maximum size is that big messages require big memory
|
|
allocations, which may lead to problems.
|
|
|
|
.B NOTE:
|
|
It is not recommended to mix maildirs and sub-maildirs within the hierarchy
|
|
in the same database; for example, it's better not to index both with
|
|
\fB\-\-maildir\fR=~/MyMaildir and \fB\-\-maildir\fR=~/MyMaildir/foo, as this
|
|
may lead to unexpected results when searching with the 'maildir:' search
|
|
parameter (see below).
|
|
|
|
.SS A note on performance (i)
|
|
As a non-scientific benchmark, a simple test on the author's machine (a
|
|
Thinkpad X61s laptop using Linux 2.6.35 and an ext3 file system) with no
|
|
existing database, and a maildir with 27273 messages:
|
|
|
|
.nf
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
$ time mu index --quiet
|
|
66,65s user 6,05s system 27% cpu 4:24,20 total
|
|
.fi
|
|
(about 103 messages per second)
|
|
|
|
A second run, which is the more typical use case when there is a database
|
|
already, goes much faster:
|
|
|
|
.nf
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
$ time mu index --quiet
|
|
0,48s user 0,76s system 10% cpu 11,796 total
|
|
.fi
|
|
(more than 56818 messages per second)
|
|
|
|
Note that each test flushes the caches first; a more common use case might
|
|
be to run \fBmu index\fR when new mail has arrived; the cache may stay
|
|
quite 'warm' in that case:
|
|
|
|
.nf
|
|
$ time mu index --quiet
|
|
0,33s user 0,40s system 80% cpu 0,905 total
|
|
.fi
|
|
which is more than 30000 messages per second.
|
|
|
|
|
|
.SS A note on performance (ii)
|
|
As per June 2012, we did the same non-scientific benchmark, this time with an
|
|
Intel) i5-2500 CPU @ 3.30GHz, an ext4 file system and a maildir with 22589
|
|
messages.
|
|
|
|
.nf
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
$ time mu index --quiet
|
|
27,79s user 2,17s system 48% cpu 1:01,47 total
|
|
.fi
|
|
(about 813 messages per second)
|
|
|
|
A second run, which is the more typical use case when there is a database
|
|
already, goes much faster:
|
|
|
|
.nf
|
|
$ sudo sh -c 'sync && echo 3 > /proc/sys/vm/drop_caches'
|
|
$ time mu index --quiet
|
|
0,13s user 0,30s system 19% cpu 2,162 total
|
|
.fi
|
|
(more than 173000 messages per second)
|
|
|
|
|
|
In general, \fBmu\fR has been getting faster with each release, even with
|
|
relatively expensive new features such as text-normalization (for
|
|
case-insensitve/accent-insensitive matching). The profiles are dominated by
|
|
operations in the Xapian database now.
|
|
|
|
.SH FILES
|
|
By default, \fBmu index\fR stores its message database in \fI~/.mu/xapian\fR;
|
|
the database has an embedded version number, and \fBmu\fR will automatically
|
|
update it when it notices a different version. This allows for automatic
|
|
updating of \fBmu\fR-versions, without the need to clear out any old
|
|
databases.
|
|
|
|
However, note that versions of \fBmu\fR before 0.7 used a different scheme,
|
|
which puts the database in \fI~/.mu/xapian\-<version>\fR. These older
|
|
databases can safely be deleted. Starting from version 0.7, this manual
|
|
cleanup should no longer be needed.
|
|
|
|
\fBmu\fR stores logs of its operations and queries in \fI<muhome>/mu.log\fR
|
|
(by default, this is \fI~/.mu/mu.log\fR). Upon startup, \fBmu\fR checks the
|
|
size of this log file. If it exceeds 1 MB, it will be moved to
|
|
\fI~/.mu/mu.log.old\fR, overwriting any existing file of that name, and start
|
|
with an empty log file. This scheme allows for continued use of \fBmu\fR
|
|
without the need for any manual maintenance of log files.
|
|
|
|
.SH ENVIRONMENT
|
|
|
|
\fBmu index\fR uses \fBMAILDIR\fR to find the user's Maildir if it has not
|
|
been specified explicitly with \fB\-\-maildir\fR=\fI<maildir>\fR. If
|
|
\fBMAILDIR\fR is not set, \fBmu index\fR will try \fI~/Maildir\fR.
|
|
|
|
.SH RETURN VALUE
|
|
|
|
\fBmu index\fR return 0 upon successful completion, and any other number
|
|
greater than 0 signals an error.
|
|
|
|
.SH BUGS
|
|
|
|
Please report bugs if you find them:
|
|
.BR https://github.com/djcb/mu/issues
|
|
|
|
.SH AUTHOR
|
|
|
|
Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
.BR maildir(5)
|
|
.BR mu(1)
|
|
.BR mu-find(1)
|
|
.BR mu-cfind(1)
|