Hello,
I built an interface where I search for text with Xapian, and I'd like
to implement highlighting of matching terms on query results.
I could not find something to help with that in Xapian itself, so I
tried implementing my own one. It works reasonably well, except in case
of phrase searches, where it will highlight individual words even if
they are not consecutive (due to turning the query into a set of words).
Can one do better, without ending up knee-deep in yak fur?
Here's what I have so far:
def get_highlights(self, result: ResultEntry) -> Generator[str, None,
None]:
"""
Return text with highlighted search terms
"""
# From https://github.com/daevaorn/djapian/issues/73
search_terms = set(self.last_query)
for text in result.entry.iter_text_paragraphs():
if not text:
continue
words = text.split()
found = False
highlighted: List[str] = []
for word in words:
token = word.encode().lower()
token = self.re_token.sub(b"", token)
stemmed = b'Z' + self.stemmer(token)
if token in search_terms or stemmed in search_terms:
highlighted.append("<b>" + html.escape(word)
+ "</b>")
found = True
else:
highlighted.append(html.escape(word))
if found:
yield " ".join(highlighted)
Enrico
--
GPG key: 4096R/634F4BD1E7AD5568 2009-05-08 Enrico Zini <enrico at
enricozini.org>
On Thu, Sep 08, 2022 at 05:15:39PM +0200, Enrico Zini wrote:> I built an interface where I search for text with Xapian, and I'd like > to implement highlighting of matching terms on query results. > > I could not find something to help with that in Xapian itself, so I > tried implementing my own one. It works reasonably well, except in case > of phrase searches, where it will highlight individual words even if > they are not consecutive (due to turning the query into a set of words). > > Can one do better, without ending up knee-deep in yak fur?Xapian::MSet::snippet() can generate a highlighted snippet (or highlight the full text if you set the required length to a value longer than the text length). It knows how to correctly highlight phrases (and also handles stemming and wildcards): https://xapian.org/docs/apidoc/html/classXapian_1_1MSet.html#ab3af7b20654dcc6e3335cc21be74efda The highlighting itself is a bit specific to HTML (or XML) currently which might be a limitation for some uses. Cheers, Olly