lib: implement new query parser
mu's query parser is the piece of software that turns your queries
into something the Xapian database can understand. So, if you query
"maildir:/inbox and subject:bla" this must be translated into a
Xapian::Query object which will retrieve the sought after messages.
Since mu's beginning, almost a decade ago, this parser was based on
Xapian's default Xapian::QueryParser. It works okay, but wasn't really
designed for the mu use-case, and had a bit of trouble with anything
that's not A..Z (think: spaces, special characters, unicode etc.).
Over the years, mu added quite a bit of pre-processing trickery to
deal with that. Still, there were corner cases and bugs that were
practically unfixable.
The solution to all of this is to have a custom query processor that
replaces Xapian's, and write it from the ground up to deal with the
special characters etc. I wrote one, as part of my "future, post-1.0
mu" reseach project, and I have now backported it to the mu 0.9.19.
From a technical perspective, this is a major cleanup, and allows us
to get rid of much of the fragile preprocessing both for indexing and
querying. From and end-user perspective this (hopefully) means that
many of the little parsing issues are gone, and it opens the way for
some new features.
From an end-user perspective:
- better support for special characters.
- regexp search! yes, you can now search for regular expressions, e.g.
subject:/h.ll?o/
will find subjects with hallo, hello, halo, philosophy, ...
As you can imagine, this can be a _heavy_ operation on the database,
and might take quite a bit longer than a normal query; but it can be
quite useful.
This commit is contained in:
104
lib/parser/tree.hh
Normal file
104
lib/parser/tree.hh
Normal file
@ -0,0 +1,104 @@
|
||||
/*
|
||||
** Copyright (C) 2017 Dirk-Jan C. Binnema <djcb@djcbsoftware.nl>
|
||||
**
|
||||
** This library is free software; you can redistribute it and/or
|
||||
** modify it under the terms of the GNU Lesser General Public License
|
||||
** as published by the Free Software Foundation; either version 2.1
|
||||
** of the License, or (at your option) any later version.
|
||||
**
|
||||
** This library is distributed in the hope that it will be useful,
|
||||
** but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
** Lesser General Public License for more details.
|
||||
**
|
||||
** You should have received a copy of the GNU Lesser General Public
|
||||
** License along with this library; if not, write to the Free
|
||||
** Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
|
||||
** 02110-1301, USA.
|
||||
*/
|
||||
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <iostream>
|
||||
|
||||
#include <parser/data.hh>
|
||||
|
||||
namespace Mux {
|
||||
|
||||
// A node in the parse tree
|
||||
struct Node {
|
||||
enum class Type {
|
||||
Empty, // only for empty trees
|
||||
OpAnd,
|
||||
OpOr,
|
||||
OpXor,
|
||||
OpAndNot,
|
||||
OpNot,
|
||||
Value,
|
||||
Range,
|
||||
Invalid
|
||||
};
|
||||
|
||||
Node(Type _type, std::unique_ptr<Data>&& _data):
|
||||
type{_type}, data{std::move(_data)} {}
|
||||
Node(Type _type): type{_type} {}
|
||||
Node(Node&& rhs) = default;
|
||||
|
||||
Type type;
|
||||
std::unique_ptr<Data> data;
|
||||
|
||||
static constexpr const char* type_name (Type t) {
|
||||
switch (t) {
|
||||
case Type::Empty: return ""; break;
|
||||
case Type::OpAnd: return "and"; break;
|
||||
case Type::OpOr: return "or"; break;
|
||||
case Type::OpXor: return "xor"; break;
|
||||
case Type::OpAndNot: return "andnot"; break;
|
||||
case Type::OpNot: return "not"; break;
|
||||
case Type::Value: return "value"; break;
|
||||
case Type::Range: return "range"; break;
|
||||
case Type::Invalid: return "<invalid>"; break;
|
||||
default:
|
||||
throw std::runtime_error ("bug");
|
||||
}
|
||||
}
|
||||
|
||||
static constexpr bool is_binop(Type t) {
|
||||
return t == Type::OpAnd || t == Type::OpAndNot ||
|
||||
t == Type::OpOr || t == Type::OpXor;
|
||||
}
|
||||
};
|
||||
|
||||
inline std::ostream&
|
||||
operator<< (std::ostream& os, const Node& t)
|
||||
{
|
||||
os << Node::type_name(t.type);
|
||||
if (t.data)
|
||||
os << t.data;
|
||||
|
||||
return os;
|
||||
}
|
||||
|
||||
struct Tree {
|
||||
Tree(Node&& _node): node(std::move(_node)) {}
|
||||
Tree(Tree&& rhs) = default;
|
||||
|
||||
void add_child (Tree&& child) { children.emplace_back(std::move(child)); }
|
||||
bool empty() const { return node.type == Node::Type::Empty; }
|
||||
|
||||
Node node;
|
||||
std::vector<Tree> children;
|
||||
};
|
||||
|
||||
inline std::ostream&
|
||||
operator<< (std::ostream& os, const Tree& tree)
|
||||
{
|
||||
os << '(' << tree.node;
|
||||
for (const auto& subtree : tree.children)
|
||||
os << subtree;
|
||||
os << ')';
|
||||
|
||||
return os;
|
||||
}
|
||||
|
||||
} // namespace Mux
|
||||
Reference in New Issue
Block a user