Hello James,
Thanks for the suggestions! I've tried to answer some of your questions.
This
is a rough idea of how the autocomplete can be implemented. I know that
this would be quite slow and I'm trying to figure out how this can be
improved. This is what I think can be done:
-
Construct a weighted undirected graph with words as nodes ( weights as
the number of times those 2 words have been searched together or found
together in the documents).
-
Each node also keeps track of 5 most commonly used words with
it(redundancy to save time from sorting its neighbours).
-
Whenever a user starts typing his query, this graph is queried and the
first word is predicted by prefix matching. The results of the matching are
shown in order of their searched frequency.
-
When the user starts typing the next word, the graph is visited and the
neighbours of this word and any other word are retrieved and shown to the
user in order of frequency(weight of edges.).
This is a rough overview of what I think can be used as an algorithm for
autocomplete. I am going to read some research papers and improve on this.
Learning from user queries: The method I have suggested is pretty basic.
Incrementing the weight of edges whenever 2 or more words are searched
together.
Stop Words: Since the prediction is quite basic right now, I dont think
that stop words can be integrated in providing query predictions in scope
of this project. What are your thoughts?
Another part of this project would be to implement the bindings for
currently supported languages.
I thought about your suggestion to let autocomplete be a part of xapian and
use a separate database for it. What are your views on this?
Ayush
On Thu, Mar 10, 2016 at 10:51 PM, James Aylett <james-xapian at
tartarus.org>
wrote:
> On Thu, Mar 10, 2016 at 05:59:38PM +0530, Ayush Gupta wrote:
>
> > Could you please expand on the project idea of integration of xapian
in a
> > framework with an example. I did not fully understand the requirements
of
> > this project.
>
> It would be about adding or improving support for Xapian when used
> with some sort of programming framework, probably a web development
> framework like Rails, Django, Play, &c. You'd need experience with
the
> framework you wanted to work with, and to think about the use cases
> that need to be supported, and design an API to allow them to be
> solved.
>
> Trac (which is mentioned in the project description) effectively
> contains its own framework, which is why that's under the same
> project.
>
> It's a little difficult to provide concrete requirements, because
> that's very dependent on the framework you choose and what you want to
> make possible. For instance, Django has a search abstraction system
> called Haystack, so for that it might make sense to improve the Xapian
> support there. For others, there may be no support at all and you'd
> have to start from scratch.
>
> > Also I want to discuss an idea of my own. Xapian doesn't have an
auto
> > complete feature. It is quite common for an search engine to have an
auto
> > complete feature. What I propose is a API that is totally separate
from
> > xapian core, has its own indexing and learns from user queries as well
as
> > documents. I know this is a very rough idea, please help me refine it.
> What
> > are your views on this?
>
> It sounds like an interesting project, but it's difficult to evaluate
> further until you've put some more detail around it. That's the big
> difference between a project we've listed and one you come up with
> yourself. If you can get it to a draft proposal in some form, then we
> can provide feedback on that. At this point, beyond saying that it
> sounds like a good idea I can't offer any concrete suggestions, as I
> mostly have questions :-)
>
> Some things that I think you should consider in the proposal include
> what API you're suggesting for it (try starting by writing the code
> you'd want to write to use it, and see what feels natural). Another
> key aspect would be to explain what it is about autocomplete that
> couldn't be achieved by directly using Xapian (perhaps with a separate
> database for autocomplete). Other things to consider include how
> synonyms and spelling correction would apply, if at all? What sort of
> ranking model makes sense for autocompletion? What sort of data are
> you dealing with? You suggest it could learn from user queries; your
> proposal should explain how, and what benefit that would provide (over
> just the initially-indexed data).
>
> J
>
> --
> James Aylett, occasional trouble-maker
> xapian.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xapian.org/pipermail/xapian-devel/attachments/20160311/53cd76cd/attachment-0001.html>