similar to: Need Beginner Guide for Matcher Optimisations Project

Displaying 20 results from an estimated 3000 matches similar to: "Need Beginner Guide for Matcher Optimisations Project"

2013 Feb 27
2
Reading a password-protected PDF
Hello respected developers, I was wondering if it is possible for xapian to read a password-protected PDF. Searches in the archives and google had yield 0 results. I also tried looking at the source code but I could not find the specific one related to this issue. The characteristic of the set of PDF is as: 1. a set of password protected PDF documents 2. all PDF is set with the same password. 3.
2013 Mar 02
2
Getting Started
Hello all, I am Mohd Azeem. I want to contribute in Xapian but I am a newbie here. I wonder if anyone could help me in getting started with Xapian. I have some basic knowledge of IR and implemented TF*IDF and PageRank schemes, and also implemented Inverted Index and Web-Crawler. regards, Azeem -------------- next part -------------- An HTML attachment was scrubbed... URL:
2013 Mar 05
1
Remote database & local database, and adding new weight found vtable error
Hello, guys. Q1. now I have load all the docid and its document data into a dictionary for faster loading data instead of calling Xapian::MSetIterator i; i.get_document().get_data(); but I was happened to discover that the dictionaries got by such two method were different: both methods use DB1, DB2 method 1: Xapian::Database db = Xapian::Database(the path of DB1); Xapian::Database db2 =
2013 Mar 02
3
How to add an custom weight to the relevancy value and sort it.
Hello guys, I have an weight value which is calculated by some factor and i need to add the weight with the relevancy value of a result and sort it with that value is that possible in xapian. Thanks, VishnuKumar -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130302/9831e287/attachment-0001.html>
2013 Jan 09
2
Explanation of how Eset works
Hey guys hi.I am trying to understand how Xapian works .I read the Theoretical Background to Xapian doc and the report by Salton and Jones.I still cant seem to understand how Eset works How exactly does Xapian add terms to expand a query ? Assuming we have a list of the k most important terms, how do we decide which term to add to the query and will be in context with the query ? And to decide r
2013 Feb 19
2
Implementing tf-idf weighting scheme in Xapian
Hello guys.I just read up about tf-idf schemes and want to implement it in Xapian (with some frequently used normalizations) as it will also give me a good hang of implementing a weighting scheme before I start working on implementing DFR schemes. I read the following as references and I think Ive understood it well and can write the hack :- 1.)
2013 Jun 16
3
Backend for Lucene format indexes-How to get doclength
Hi, all: I have wrote a demo patch for Backend for Lucene format indexes, Lucene version is 3.6.2. http://lucene.apache.org/core/3_6_2/fileformats.html Now, this demo patch just support the basic features in Lucene. Compound File(.cfs/.cfe)?term vector(.tvx/.tvd/.tvf) delete document(.del) are not supported, skip list in .fdx is not supported too example/quest.cc is used to test this demo.
2014 Mar 04
2
Test Dataset for performance and accuracy analysis
Hi Parth, I implemented DFR algorithms in Xapian as a part of GSOC last year under the mentorship of Olly. This year, I want to work on analyzing and optimizing the performance of the DFR algorithms and comparing them with BM25.I also want to work on profiling the query expansion schemes and test the relevance(precision and recall) / speed(time taken) of the
2012 Dec 08
2
Want to contribute code to the Xapian project
Hey guys,I am a 3rd year Computer Science undergrad student.I a extremely interested in contributing code to the XAPIAN project. The work you people do sounds extremely fascinating and interesting.Can someone just give me a brief overview of how to proceed ?. I Can code in C,C++ and Python and have experience in Natural Lanuage Processing.Am also quite comfortable with NLTK and using Wordnet.Am
2007 Jun 15
2
model.frame: how does one use it?
Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart. In short, the issue has to do with how rpart evaluates a formula and supporting arguments, in particular 'weights'. A simple contrived example is ----------------------------------------------------------------------------- library(rpart) ## using data from help(rpart), set up simple example myformula <-
2013 Jul 17
1
Base class for query expansion
Hello Dan and Olly, this is the code for the base class for query expansion that I have written. The code will not compile as I have written only the base class until now. Have yet to use it. Please do tell me what you think of the base class and what changes you suggest I should make before I move forward with the project. https://github.com/xapian/xapian/pull/23 -Regards -Aarsh --------------
2013 Jun 17
2
Backend for Lucene format indexes-How to get doclength
*Or do you mean that it's one number per document whereas the other stats are per database, so it's harder to store it?* yes, I mean this. It's a huge data. If a new doclength list(contains all the doclength in a list, like chert) is added by myself, I am concern about: 1. This doclength list may be the bottlenect in this backend, http://trac.xapian.org/ticket/326 2. Change too much
2014 May 14
2
Starting work on Perf Test Module
Hello, I am beginning work on the perf test module. The initial steps that I aim to accomplish are :- -> Download the wikipedia dumps for multiple languages . -> Write python scripts to tokenize the dump (will probably use something like nltk which has powerful inbuilt tokenizers) -> Discuss and finalize the design of the search and query expansion perf tests as I want to complete them
2013 Jan 27
1
Added a python example to the community page
Hey guys,I have added a python indexer example to the SampleCode page of our wiki.Please do have a look.The code can also be found here :- https://github.com/aarshkshah1992/xapian/blob/efcf443527b74326119bbc0935fc41a002ce60db/xapian-bindings/python/docs/examples/simpleindexgrep.py/ Thanks :) -Regards -Aarsh -------------- next part -------------- An HTML attachment was scrubbed... URL:
2013 Jun 22
2
Dealing with negative weights
I was adding the calculations for a lower bound to get_sumpart() (DLH has no term independent component) when I realized that the same lower bound will be calculated for each term-docment pair that get_sumpart is called pair which basically reduces efficiency. How do I calculate the lower bound for a term only once and then use it ? -Regards -Aarsh On Fri, Jun 21, 2013 at 4:41 PM, Olly Betts
2013 Jun 20
2
Dealing with negative weights
Hello guys. I am currently working on the DLH weighting scheme .The formula for DLH is very complex and it ends up giving negative weights to some documents because of the formula.Due to this,inspite of having occurence/occurences of the keyword, the documents with negative weights don't show up in the results at all. Please can I get some help on how to deal with this ? Or should I just leave
2013 Mar 26
1
Merging of the TfIdf patch
Hello Guys. I have updated the code,tests,documentation,makefile entries and the registry entry of the* *TfIdf patch as per the feedback.Please do let me know if any additional changes are required before the patch can be merged, -Regards -Aarsh On Sun, Mar 3, 2013 at 2:50 PM, aarsh shah <aarshkshah1992 at gmail.com> wrote: > Hello guys.I have sent a pull request for the code and
2013 Apr 11
1
Added support for TfIdf to Omega
Hello guys,I have added code for tfidf to the weight.cc file in omega/ . Here is the patch : - https://github.com/aarshkshah1992/xapian/commit/5ff41a15f574e6780cc61e67e7f3da3d97ff4ec8 It compiles well and I think it'll work well. Here's the link to the documentation file omegascript.rst where I've added tfidf.
2008 Jul 30
3
Dealing with image PDF's
Guys, I was just playing around and added a bit of code to omindex.cc so I could ocr tiff and tif with gocr which seems to work. Here's what it looks like: // Tiff: } else if (startswith(mimetype, "image/tif")) { // Inspired by http://mjr.towers.org.uk/comp/sxw2text string safefile = shell_protect(file); string cmd = "tifftopnm " + safefile + "
2008 Jul 30
3
Dealing with image PDF's
Guys, I was just playing around and added a bit of code to omindex.cc so I could ocr tiff and tif with gocr which seems to work. Here's what it looks like: // Tiff: } else if (startswith(mimetype, "image/tif")) { // Inspired by http://mjr.towers.org.uk/comp/sxw2text string safefile = shell_protect(file); string cmd = "tifftopnm " + safefile + "