From 5e2b7d52b24d3f3a9ea4273cef91d1170f9ccb78 Mon Sep 17 00:00:00 2001 From: "Dirk-Jan C. Binnema" Date: Sun, 5 Jan 2025 22:45:39 +0200 Subject: [PATCH] mu-query.7: update documentation In particular, regex searches. --- man/mu-query.7.org | 170 +++++++++++++++++++++++++++++++++------------ 1 file changed, 126 insertions(+), 44 deletions(-) diff --git a/man/mu-query.7.org b/man/mu-query.7.org index 0cb1cd3c..694a56eb 100644 --- a/man/mu-query.7.org +++ b/man/mu-query.7.org @@ -18,48 +18,87 @@ provide examples. As a companion to this, we recommend the *mu info fields* command to get an up-to-date list of the available fields and flags. Furthermore, *mu find* provides the *--analyze* option, which shows how *mu* -interprets your query; see the *ANALYZING QUERIES* section below. +interprets your query; similarly, mu4e has a command. mu4e-analyze-last-query. +See the *ANALYZING QUERIES* section for further details. *NOTE:* if you use queries on the command-line (say, for *mu find*), you need to quote any characters that would otherwise be interpreted by the shell, such as -*""*, *(* and *)* and whitespace. +'"', '*', '(' and ')'. The details are shell-specific. In case of doubt, the +*--analyze* option can be useful. * TERMS The basic building blocks of a query are *terms*; these are just normal words like -`banana' or `hello', or words prefixed with a field-name which makes them apply +"banana" or "hello", or words prefixed with a field-name which makes them apply to just that field. See *mu info fields* for all the available fields. Some example queries: + #+begin_example vacation subject:capybara maildir:/inbox #+end_example -Terms without an explicit field-prefix, (like `vacation' above) are interpreted -like: +Terms without an explicit field-prefix, (like "vacation" above) are interpreted +as: + #+begin_example to:vacation or subject:vacation or body:vacation or ... #+end_example -The language is case-insensitive for terms and attempts to `flatten' diacritics, +The language is case-insensitive for terms and attempts to "flatten" diacritics, so =angtrom= matches =Ångström=. -If terms contain whitespace, they need to be quoted: +If terms contain whitespace, they need to be quoted. + #+begin_example subject:"hi there" #+end_example -This is a so-called =phrase query=, which means that we match against subjects -that contain the literal phrase "hi there". Phrase queries only work for fields -that are /indexed/, i.e., fields with *index* in the *mu info fields* search column. -Remember that you need to escape those quotes when using this from the -command-line: +This is a so-called =phrase query=, which means that we match against subjects +that contain the literal phrase "hi there". Phrase queries only work for certain +fields; they have the word *phrase* in their *mu info fields* search column. + +** Quoting queries for the shell + +Remember that you need to escape the quotes for a search query when using this +from the command-line; otherwise, the shell (or most shells) process the queries +and *mu* never sees them. + +In this case, that means the difference between search for a subject "hi there" +versus and subject "hi" and some word "there" that can appear in any of the +combination fields for (combination fields are discussed below). + +We can use the mentioned *--analyze* option to show the difference: + #+begin_example -mu find subject:\\"hi there\\" +mu find subject:"hi there" --analyze +* query: + subject:hi there +* parsed query: + (and (subject "hi") (_ "there")) +* parsed query (expanded): + (and (subject "hi") (or (to "there") (cc "there") (bcc "there") (from "there") (subject "there") (body "there") (embed "there"))) +* Xapian query: + Query((Shi AND (Tthere OR Cthere OR Hthere OR Fthere OR Sthere OR Bthere OR Ethere))) #+end_example +And with quotes escaped: + +#+begin_example +mu find subject:\"hi there\" --analyze +* query: + subject:"hi there" +* parsed query: + (or (subject "hi there") (subject (phrase "hi there"))) +* Xapian query: + Query((Shi there OR (Shi PHRASE 2 Sthere))) +#+end_example + +We won't dwell on the details of the *--analyze* output here, but hopefully this +illustrates the difference between quoted and unquoted queries. + * LOGICAL OPERATORS We can combine terms with logical operators -- binary ones: *and*, *or*, *xor* and the @@ -81,37 +120,72 @@ subject:chip AND subject:dale #+end_example are equivalent. For readability, we recommend the second version. -Note that a =pure not= - e.g. searching for *not apples* is quite a `heavy' query. +Note that a =pure not= - e.g. searching for *not apples* is quite a "heavy" query. -* REGULAR EXPRESSIONS AND WILDCARDS +* WILDCARDS -The language supports matching basic PCRE regular expressions, see {{{man-link(pcre,3)}}}. +Wildcards are a Xapian built-in mechanism for matching. -Regular expressions are enclosed in *//*. Some examples: +A search term with a rightmost *** (and =only= in that position) matches any term +that starts with the part before the ***; they are less powerful than regular +expressions, but also much faster: + +An example: +#+begin_example +$ mu find "hello*" +#+end_example + +Quoting the "hello*" is recommended; some shells (but not all) would otherwise +expand the '*' to all files in the current directory. + +* REGULAR EXPRESSIONS + +The query language supports matching basic PCRE regular expressions, as per +{{{man-link(pcre,3)}}}, with some limitations. + +Regular expressions are enclosed in *//*. For example: #+begin_example subject:/h.llo/ # match hallo, hello, ... -subject:/ #+end_example -Note the difference between `maildir:/foo' and `maildir:/foo/'; the former -matches messages in the `/foo' maildir, while the latter matches all messages in -all maildirs that match `foo', such as `/foo', `/bar/cuux/foo', `/fooishbar' -etc. +Note the difference between "maildir:/foo" and "maildir:/foo/"; the former +matches messages in the "/foo" maildir, while the latter matches all messages in +all maildirs that match "foo", such as "/foo", "/bar/cuux/foo", "/fooishbar", +and so on. -Wildcards are another mechanism for matching where a term with a rightmost *** -(and =only= in that position) matches any term that starts with the part before -the ***; they are therefore less powerful than regular expressions, but also much -faster: +Regular expressions are more powerful than wildcards, but are also much slower. +Moreover, their behavior in *mu* can be a bit confusing, due to some +implementation details. See below for some of the caveats. + +** Whitespace in regular expression literals + +To avoid ambiguities in the query parsing, regular express *must not* contain +whitespace, so the search for a message with subject "hello world", you can write #+begin_example -foo* +mu find 'subject:/hello\\040world/' #+end_example -is equivalent to +(with the \040 specifying a space in the regular expression, and and extra '\' +to escape it). In many cases, #+begin_example -/foo.*/ +mu find 'subject:/hello.world/' #+end_example +may be good enough, and easier to type. -Regular expressions can be useful, but are relatively slow. +** Anchors in regular expressions + +Since the underlying Xapian database does not support regular expressions (it +does support wildcards), *mu* implements the regular-expression search by matching +the user's regular expression against all "terms" (words or phrases) that in the +database for a given field. + +That implementation detail explain why "anchored" regular expressions (with *^* +and *$* to mark begin/end, respectively) can get unexpected results. + +Suppose you want to match all messages that start with "pie", and you search +with *subject:/^pie/*. This /also/ matches messages with subject "apple pie", since +both those words are indexed as terms separately (as well as phrases), and thus +"^pie" matches as well for a message with subject "apple pie". * FIELDS @@ -208,10 +282,10 @@ an open range. Dates are expressed in local time and using ISO-8601 format (YYYY-MM-DD HH:MM:SS); you can leave out the right part and *mu* adds the rest, depending on whether this is the beginning or end of the range (e.g., as a lower bound, -`2015' would be interpreted as the start of that year; as an upper bound as the +"2015" would be interpreted as the start of that year; as an upper bound as the end of the year). -You can use `/' , `.', `-', `:' and `T' to make dates more human-readable. +You can use `/' , `.', `-', `:' and "T" to make dates more human-readable. Some examples: #+begin_example @@ -222,13 +296,13 @@ date:2015-06-01.. date:2016..2016 #+end_example -You can also use the special `dates' *now* and *today*: +You can also use the special "dates" *now* and *today*: #+begin_example date:20170505..now date:today.. #+end_example -Finally, you can use relative `ago' times which express some time before now and +Finally, you can use relative "ago" times which express some time before now and consist of a number followed by a unit, with units *s* for seconds, *M* for minutes, *h* for hours, *d* for days, *w* for week, *m* for months and *y* for years. Some examples: @@ -315,7 +389,7 @@ find it (and all the other messages in that same maildir) with: maildir:/lists/running #+end_example -Note the starting `/'. If you want to match mails in the `root' maildir, you can +Note the starting `/'. If you want to match mails in the "root" maildir, you can do with a single `/': #+begin_example maildir:/ @@ -343,7 +417,7 @@ queries using various logical operators, parentheses and so on, but in the author's experience, it's usually faster to find a message with a simple query just searching for some words. -Find all messages with both `bee' and `bird' (in any field) +Find all messages with both "bee" and "bird" (in any field) #+begin_example bee AND bird #+end_example @@ -353,12 +427,12 @@ Find all messages with either Frodo or Sam: Frodo OR Sam #+end_example -Find all messages with the `wombat' as subject, and `capybara' anywhere: +Find all messages with the "wombat" as subject, and "capybara" anywhere: #+begin_example subject:wombat and capybara #+end_example -Find all messages in the `Archive' folder from Fred: +Find all messages in the "Archive" folder from Fred: #+begin_example from:fred and maildir:/Archive #+end_example @@ -385,7 +459,7 @@ Find a messages with the given message-id: msgid:CAE56pjGU2oNxN-wWku69@mail.gmail.com #+end_example -Find all messages written in Dutch or German with the word `hallo': +Find all messages written in Dutch or German with the word "hallo": #+begin_example hallo and (lang:nl or lang:de) #+end_example @@ -395,14 +469,18 @@ for "cld2-support*. * ANALZYING QUERIES -Despite all the excellent documentation, in some cases it can be non-obvious how -*mu* interprets your query. For that, you can ask *mu* to analyze the query -- that -is, show how *mu* interprets the query. +Despite all the excellent documentation, in some cases it can be non-obvious to +understand how *mu* interprets your query, especially when shell interpretation is +involved as well. + +For that, you can ask *mu* to analyze the +query -- that is, show how *mu* interprets the query. We already saw an example of +this. This uses the the *--analyze* option to *mu find*. #+begin_example $ mu find subject:wombat AND date:3m.. size:..2000 --analyze -,* query: +,*query: subject:wombat AND date:3m.. size:..2000 ,* parsed query: (and (subject "wombat") (date (range "2023-05-30T06:10:09Z" "")) (size (range "" "2000"))) @@ -411,7 +489,11 @@ $ mu find subject:wombat AND date:3m.. size:..2000 --analyze #+end_example The ~parsed query~ is usually the most useful one for understanding how *mu* -interprets your query. +interprets your query; it shows the query as *mu* sees it, in s-expression +notation. + +In *mu4e* there is the *mu4e-analyze-last-query* command, which provides similar +information. #+include: "prefooter.inc" :minlevel 1