Displaying 20 results from an estimated 20000 matches similar to: "Base class for query expansion"
2014 May 14
2
Starting work on Perf Test Module
Hello,
I am beginning work on the perf test module. The initial steps that I aim
to accomplish are :-
-> Download the wikipedia dumps for multiple languages .
-> Write python scripts to tokenize the dump (will probably use something
like nltk which has powerful inbuilt tokenizers)
-> Discuss and finalize the design of the search and query expansion perf
tests as I want to complete them
2014 Mar 04
2
Test Dataset for performance and accuracy analysis
Hi Parth,
I implemented DFR algorithms in Xapian as
a part of GSOC last year under the mentorship of Olly. This year, I want to
work on analyzing and optimizing the performance of the DFR algorithms and
comparing them with BM25.I also want to work on profiling the query
expansion schemes and test the relevance(precision and recall) / speed(time
taken) of the
2013 Jan 27
1
Added a python example to the community page
Hey guys,I have added a python indexer example to the SampleCode page of
our wiki.Please do have a look.The code can also be found here :-
https://github.com/aarshkshah1992/xapian/blob/efcf443527b74326119bbc0935fc41a002ce60db/xapian-bindings/python/docs/examples/simpleindexgrep.py/
Thanks :)
-Regards
-Aarsh
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2012 Dec 08
2
Want to contribute code to the Xapian project
Hey guys,I am a 3rd year Computer Science undergrad student.I a extremely
interested in contributing code to the XAPIAN project. The work you people
do sounds extremely fascinating and interesting.Can someone just give me a
brief overview of how to proceed ?. I Can code in C,C++ and Python and
have experience in Natural Lanuage Processing.Am also quite comfortable
with NLTK and using Wordnet.Am
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in
Xapian (with some frequently used normalizations) as it will also give me a
good hang of implementing a weighting scheme before I start working on
implementing DFR schemes.
I read the following as references and I think Ive understood it well and
can write the hack :-
1.)
2013 Mar 04
2
Need Beginner Guide for Matcher Optimisations Project
Hi,
While searching for a project which matches my interest andskill level, I
found this project named Matcher Optimization. This project is really
challenging and excting from my view point and I would like to be a part of
this project.
Optimization techniques metioned in the reference links provided will take
some time for me to have a good understanding about them. But I am trying
to get my
2013 Mar 26
1
Merging of the TfIdf patch
Hello Guys. I have updated the code,tests,documentation,makefile entries
and the registry entry of the* *TfIdf patch as per the feedback.Please do
let me know if any additional changes are required before the patch can be
merged,
-Regards
-Aarsh
On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah <aarshkshah1992 at gmail.com> wrote:
> Hello guys.I have sent a pull request for the code and
2013 Jun 22
2
Dealing with negative weights
I was adding the calculations for a lower bound to get_sumpart() (DLH has
no term independent component) when I realized that the same lower bound
will be calculated for each term-docment pair that get_sumpart is called
pair which basically reduces efficiency. How do I calculate the lower bound
for a term only once and then use it ?
-Regards
-Aarsh
On Fri, Jun 21, 2013 at 4:41 PM, Olly Betts
2013 Apr 11
1
Added support for TfIdf to Omega
Hello guys,I have added code for tfidf to the weight.cc file in omega/ .
Here is the patch : -
https://github.com/aarshkshah1992/xapian/commit/5ff41a15f574e6780cc61e67e7f3da3d97ff4ec8
It compiles well and I think it'll work well.
Here's the link to the documentation file omegascript.rst where I've added
tfidf.
2013 Mar 27
1
Need help as Pl2 tests not performing as expected
Hello guys. I just ran the updated tests for PL2 and they are not giving
the mset order I expect.Now,the thing is, dfr's behavior is a bit hard to
predict and so even if I expect a particular order ,it may give another
order and still be correct.So,the only way to write correct tests for PL2
is to manually calculate the weight of the documents to decide the expected
order.For that,I need to
2013 Jan 09
2
Explanation of how Eset works
Hey guys hi.I am trying to understand how Xapian works .I read the
Theoretical Background to Xapian doc
and the report by Salton and Jones.I still cant seem to understand how Eset
works How exactly does Xapian add terms to expand a query ? Assuming we
have a list of the k most important terms, how do we decide which term to
add to the query and will be in context with the query ?
And to decide r
2014 Mar 01
2
Complete GSOC idea
Hi everyone,
I am thinking of working on the
following ideas for my GSOC proposal based on my discussions with Olly and
my own understanding. Rather than focusing on an entire perftest module, I
have decided to focus on implementing performance tests for weighting
schemes based on a wikipedia dump and in addition to that, build a
framework to measure the
2013 Jul 22
0
Query Expansion trial version ready
Hello guys, I have some good news. After a lot of hardwork, here is the
trial version of our new query expansion mechanism. The code compiles well.
But, I have yet to test it extensively. Dan's advice of making ExpandStats
a member of ExpandWeight really proved to be useful. Thanks Dan ! :) While
I work on testing the mechanism, I would really appreciate if I got
feedback and reviews about the
2013 Jun 20
2
Dealing with negative weights
Hello guys. I am currently working on the DLH weighting scheme .The formula
for DLH is very complex and it ends up giving negative weights to some
documents because of the formula.Due to this,inspite of having
occurence/occurences of the keyword, the documents with negative weights
don't show up in the results at all. Please can I get some help on how to
deal with this ? Or should I just leave
2013 Jan 10
1
Add an example to the community page and contribute more code
Hi guys.I've finished an example indexer which acts like a grep replacement
for a file.It indexes each line containing a proper noun in a given text
file.The line containing the proper noun will be displayed upon searching
for that noun.I would like to add it to the community code examples.I'm
planning to write more examples which demonstrate some advanced features of
Xapian along similar
2013 Mar 15
1
DFR framework as a GSOC project
Hey guys,hi.:) I've finished implementing the PL2 scheme . The bounds I
have implemented for it are as good as I could, given the nature of the
scheme and my mathematical skills.However,tight bounds for other named DFR
schemes will be easier to implement because their forumlas are quite
simpler compared to PL2 . Will send in a pull request in a couple of days
once I'm done with the tests
2013 Jan 24
1
Integrating a PaiceHusk stemmer into the library
Hey guys Hi :) I've implemented a PaiceHusk stemmer externally So what I
am doing right now is passing a pointer to my StemPaiceHusk class(which in
turn has been subclassed from Stemimplementation) to the
Stem::Stem(StemImplementation *p) constructor .So basically,I have to
include "paicehusk.h" in my indexer .However,I now want to make it a part
of the Xapian library so that I
2013 Mar 03
0
Sent a pull request for testing TradWeight using an Rset.
Hello guys.As discussed on IRC,I have sent a pull request for a test for
testing TradWeight with an Rset.
On Fri, Mar 1, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote:
> Send Xapian-devel mailing list submissions to
> xapian-devel at lists.xapian.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>
2013 Mar 03
0
Added code and tests for the tf-idf weighting scheme.
Hello guys.I have sent a pull request for the code and tests of the Tf-Idf
weighting scheme.
Please do let me know if any changes are required.Meanwhile,Ill begin
working on implementing normalizations which require additional statistics
and on the DFR schemes.
https://github.com/xapian/xapian/pull/6
On Tue, Feb 26, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote:
>
2013 Mar 02
2
Getting Started
Hello all,
I am Mohd Azeem. I want to contribute in Xapian but I am a newbie here. I wonder if anyone could help me in getting started with Xapian. I have some basic knowledge of IR and implemented TF*IDF and PageRank schemes, and also implemented Inverted Index and Web-Crawler.
regards,
Azeem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: