On Wed, May 15, 2013 at 04:11:07AM +0200, Cs?ri Tam?s
wrote:> I've indexed many text files (using a TermGenerator from std::string),
each
> document in my database is a single file on the disk.
> The search works pretty well and finds the files that match the query
> string, but I can't figure out how I can determine the location of the
> actual matched terms. I want to show the user the row and column number of
> the match (to somehow highlight the match).
>
> So far I haven't found a solution. The closest I've got is the
> Enquire::get_matching_terms_* functions but this does not really work for
> phases and I'm still far from character positions.
Xapian doesn't store character positions (only word positions), so it
isn't able to tell you character positions for matches.
We could potentially record the word position at which we found a phrase
match, but we stop once we find the first instance of a phrase match, so
you'd only get one matching instance of the phrase per document, or else
the matcher would have to keeping looking for matching phrases, even for
documents which don't ultimately make the top N, which would mean slower
searches.
> I hope someone can give me some hints where to look to begin with.
Generally people just reparse the document to highlight matches - this
has the benefit that you don't need to worry about the offsets being
wrong if the document has been updated on disk since it was last
indexed.
You might find the resources linked to from here useful if you're
wanting to highlight matches:
http://trac.xapian.org/wiki/FAQ/Snippets
Cheers,
Olly