I'm sorry I could not figure out how the document answers my question. My
question is about the 'term' search in the dictionary of terms (term
list) maintained by Xapian. I assume that Xapian maintains an alphabetical list
of terms. When a user types a search term like "integration", the
system first looks up this dictionary of terms for this word
"integration". Is this search done using binary search or some other
method ?
Once the term is located in the dictionary, then the posting list can be
accessed. But my question is not related to the posting list. It is only about
the initial search in the dictionary of terms for the word entered by the user.
The document mentions "....Some systems represent each term by a number
internally, so the term list is then also a list of numbers. Xapian doesn't
- it uses the terms themselves, and uses prefix compression to store them
compactly". If this is the case, my question is : How is the search for the
user-entered word done on this prefix compressed store?
Thank you,
Dhiraj R
--------------------------------------------
On Sat, 8/4/17, James Aylett <james at tartarus.org> wrote:
Subject: Re: Search Algorithm Used for Keyword Search
To: "Dhiraj R" <dhirajr57 at yahoo.com>
Cc: "Xapian Development" <xapian-devel at lists.xapian.org>
Date: Saturday, 8 April, 2017, 5:34 PM
On 8 Apr 2017, at 18:03, Dhiraj R
<dhirajr57 at yahoo.com>
wrote:
> Dear Mr. Aylett,
Please keep replies on the mailing list so others can answer and everyone can
benefit.
> Thank you for your immediate response to my query. Actually, my question
is not about the document (text) part. My question is confined to method of
searching the list of keywords for a search term entered by the user. Assuming
we have already created a huge list of keywords by looking up large number
of documents. If this list of keywords is a sorted list, then binary search for
a search term entered by the user is generally ideal.
I don't think you've understood that document, which does answer your
question. Xapian isn't doing binary search; it implements a probabilistic
information retrieval system, so it's generally working off inverted indexes
(ie terms index documents when we're running a search). By default Xapian
uses BM25 as the weighting scheme, which is used to determine the ranking of
documents in search results.
J
--
James Aylett
devfort.com — spacelog.org —
tartarus.org/james/