utils: implement html-to-text

Implement a crude html-to-text scraper function, to extract plain text
from html messages, so we can use it for indexing.
This commit is contained in:
Dirk-Jan C. Binnema
2023-07-03 20:29:51 +03:00
parent 23ba61a650
commit 56b8fad89e
3 changed files with 624 additions and 0 deletions

View File

@ -265,6 +265,16 @@ std::string date_to_time_t_string(int64_t t);
*/
std::string time_to_string(const char *frm, time_t t, bool utc = false) G_GNUC_CONST;
/**
* Crudely convert HTML to plain text. This attempts to scrape the
* human-readable text from html-email so we can use it for indexing.
*
* @param html html
*
* @return plain text
*/
std::string html_to_text(const std::string& html);
/**
* Hack to avoid locale crashes
*