john.alveris at Safe-mail.net
2015-Jul-23 12:30 UTC
[Xapian-discuss] Get term from document by position
Hello. Is there any FAST way to get a term from the xapian document by it's position, something like std::string term = Xapian::Document::GetTermByPosition(int position) ? Below i have described a task that i am trying to solve, in case if somebody is interested. ===========================================================================When displaying search results, i would like to to display a piece of the document that is related to the search string (snippet). I have the following idea: 1) first i have to find the positions of the search terms by using Xapian::TermIterator::positionlist_begin() iterator. 2) second, i have to display terms that have positions that are close to the search term position (several terms from the left, and several terms from the right of the search term) So, i know positions of the terms that i want to display. But how to get the terms itself? Xapian::Document does not have something like GetTermByPosition(int position). Currently i am iterating through all of the terms, than through all of the positions, something like this: for (term = xapian_database.termlist_begin(docid); term != xapian_database.termlist_end(docid); term++) { for (pos = xapian_database.positionlist_begin(docid, *term); pos != xapian_database.positionlist_end(docid, *term); pos++) { if ( (*pos) == position_of_my_term)) my_term= *term; } } This does what i want, but this cycle takes too long to run. So, may there is a better approach?
On 23 Jul 2015, at 13:30, john.alveris at Safe-mail.net wrote:> Hello. Is there any FAST way to get a term from the xapian document by it's position, something like > std::string term = Xapian::Document::GetTermByPosition(int position) ?Not that I?m aware of. Snippet highlighting is something that was worked on for a GSoC project a few years ago, and is mentioned in our FAQ: <http://trac.xapian.org/wiki/FAQ/Snippets>. It?s not available in the 1.2 series, but as I understand it should work out of the box in 1.3.3. Note that your suggested approach of going from terms to snippet doesn?t work in the general case, because of issues like stemming. Instead, Mihai?s approach was to use the matcher information to generate a snippet from the original, unstemmed and untermed, text. J -- James Aylett, occasional trouble-maker xapian.org