Dear Sir, I'm doing a literature survey on search engines. As Xapian is open source, I think I can get the information required by me. I assume that your system builds a list of keywords and tags to every keyword the documents where it can be found. My questions are as follows: 1. What is the search algorithm used for searching the list of keywords that your search engine has?. Is it the binary search algorithm or some enhancement of it using perhaps some additional data structures?2. Are the keywords listed in alphabetical order or in some other order?3. Does search engine like 'google' use only a binary search or any augmented version of binary search for searching the list of keywords that it maintains? As I could not get these information from anyone, I request you to kindly provide me the above information as I need them for my thesis work. Thank you. Sincerely,Dhiraj R -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170408/6e2ce081/attachment.html>
On 8 Apr 2017, at 09:57, Dhiraj R <dhirajr57 at yahoo.com> wrote:> I assume that your system builds a list of keywords and tags to every keyword the documents where it can be found.We have a document on the theoretical background to Xapian, which you may find helpful: https://xapian.org/docs/intro_ir.html J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/
On 8 Apr 2017, at 18:03, Dhiraj R <dhirajr57 at yahoo.com> wrote:> Dear Mr. Aylett,Please keep replies on the mailing list so others can answer and everyone can benefit.> Thank you for your immediate response to my query. Actually, my question is not about the document (text) part. My question is confined to method of searching the list of keywords for a search term entered by the user. Assuming we have already created a huge list of keywords by looking up large number of documents. If this list of keywords is a sorted list, then binary search for a search term entered by the user is generally ideal.I don't think you've understood that document, which does answer your question. Xapian isn't doing binary search; it implements a probabilistic information retrieval system, so it's generally working off inverted indexes (ie terms index documents when we're running a search). By default Xapian uses BM25 as the weighting scheme, which is used to determine the ranking of documents in search results. J -- James Aylett devfort.com — spacelog.org — tartarus.org/james/