message: use html-to-text scraper for html parts
We were dumping the HTML-parts as-is in the Xapian indexer; however, it's better to remove the html decoration first, and just pass the text. We use the new built-in html->text scraper for that.
This commit is contained in:
11
NEWS.org
11
NEWS.org
@ -19,9 +19,14 @@
|
||||
- what used to be the ~mu fields~ command has been merged into ~mu info~; i.e.,
|
||||
~mu fields~ is now ~mu info fields~.
|
||||
|
||||
- ~mu view~ gained ~--format=html~ for it to output the HTML body of the message
|
||||
rather than the (default) plain-text body. See its updated manpage for
|
||||
details.
|
||||
- ~mu view~ gained ~--format=html~ which compels it to output the HTML body of
|
||||
the message rather than the (default) plain-text body. See its updated
|
||||
manpage for details.
|
||||
|
||||
- when encountering an HTML message part during indexing, previously (i.e.,
|
||||
~mu 1.10~) we would attempt to process that as-is, with HTML-tags etc.; this
|
||||
is now improved by employing a html->text scraper which extracts the
|
||||
human-readable text from the html.
|
||||
|
||||
- experimental: if you build ~mu~ with [[https://github.com/CLD2Owners/cld2][CLD2]] support (available in many Linux
|
||||
distros), ~mu~ will try to detect the language of the body of e-mail
|
||||
|
||||
Reference in New Issue
Block a user