Hi everyone, I am thinking of working on the following ideas for my GSOC proposal based on my discussions with Olly and my own understanding. Rather than focusing on an entire perftest module, I have decided to focus on implementing performance tests for weighting schemes based on a wikipedia dump and in addition to that, build a framework to measure the accuracy and relevance of new and old weighting schemes. * Measuring the relevance and accuracy of weighting schemes.* - The accuracy of a weighting scheme can be measured by using the concepts of precision and recall. :- http://en.wikipedia.org/wiki/Precision_and_recall - Once we have the static wikipedia dump in place, we can hardcode expected results for each query we plan to run on the data set. By comparing the expected results to the retrieved results for a number of queries for each weighting scheme, we can get a general idea of it's accuracy. - This implementation will also help determine the accuracy of new weighting schemes as and when they will be implemented in Xapian. *Profiling and Optimizing Weighting/Query Expansion Schemes* - Profile DFR schemes and identify/optimize bottlenecks. - Profile Stemming algorithms and indexing . - For profiling most searches which are fast, valgrind based profilers can be used.However, perf can be brought in for slower searches as we had discussed that valgrind based profilers may not be efficient for IO bound tasks. - The speed will first be tested using the Realtime:now function and then the profiler will be brought in if the speed appears to be too slow. - As mentioned on the ideas page too, a lot of the optimization can/will happen by mapping the forumals used to a smaller set of formulas and reduce the number of times computationally heavy operations such as log() are used. - Create a huge static data-set, preferably a Wikipedia dump. - Test the speed of the DFR schemes against the speed of BM25 and decide on a default weighting scheme. Our best bet would be a parameter free DPH schemes as the performance of the one with parameters depends on the input data too. - Similarly, a speed analysis of query expansion scheme will also be done to decide on a default query expansion scheme.These can be optimized too. I am not quite being able to decide on an ideal patch for the idea .Please can you suggest some ideas for an ideal patch as an initial first step to include with my proposal ? -Regards -Aarsh -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140301/3470f83c/attachment.html>
On Sat, Mar 01, 2014 at 10:12:36AM +0530, Aarsh Shah wrote:> I am thinking of working on the following ideas for my GSOC proposal > based on my discussions with Olly and my own understanding. Rather > than focusing on an entire perftest module, I have decided to focus on > implementing performance tests for weighting schemes based on a > wikipedia dump and in addition to that, build a framework to measure > the accuracy and relevance of new and old weighting schemes.I mentioned this on IRC (not sure if it was before or after you sent this mail), but for the benefit of anyone reading who wasn't on IRC then, we do already have an evaluation module which was originally written by Andy MacFarlane, and further worked on by Gaurav Arora: https://github.com/samuelharden/xapian-evaluation> * Measuring the relevance and accuracy of weighting schemes.* > > - The accuracy of a weighting scheme can be measured by using the > concepts of precision and recall. :- > http://en.wikipedia.org/wiki/Precision_and_recall > - Once we have the static wikipedia dump in place, we can hardcode > expected results for each query we plan to run on the data set.How would you get a list of suitable queries to run against a wikipedia dump? I've not seen public query logs for wikipedia. How would you get the "expected results for each query"? Producing a set of relevance judgements is rather time consuming. If the relevance judgements are poor quality, the conclusions of the evaluation become untrustworthy. I suspect it would be better to use an existing dataset which included queries and relevance judgements - Parth might know if there's one we could use.> *Profiling and Optimizing Weighting/Query Expansion Schemes* > > - Profile DFR schemes and identify/optimize bottlenecks. > - Profile Stemming algorithms and indexing . > - For profiling most searches which are fast, valgrind based profilers > can be used.However, perf can be brought in for slower searches as we had > discussed that valgrind based profilers may not be efficient for IO bound > tasks. > - The speed will first be tested using the Realtime:now function and > then the profiler will be brought in if the speed appears to be too slow. > - As mentioned on the ideas page too, a lot of the optimization can/will > happen by mapping the forumals used to a smaller set of formulas and reduce > the number of times computationally heavy operations such as log() are used. > - Create a huge static data-set, preferably a Wikipedia dump. > - Test the speed of the DFR schemes against the speed of BM25 and decide > on a default weighting scheme. Our best bet would be a parameter free DPH > schemes as the performance of the one with parameters depends on the input > data too. > - Similarly, a speed analysis of query expansion scheme will also be > done to decide on a default query expansion scheme.These can be optimized > too. > > I am not quite being able to decide on an ideal patch for the idea > .Please can you suggest some ideas for an ideal patch as an initial first > step to include with my proposal ?I'd suggest trying out profiling something, to get a feel for how the profiling tools work, and for how long the process of finding a bottleneck and fixing it takes. Cheers, Olly
Hi Olly, Am asking Parth if he can help me with the dataset containing query logs and expected results.Also, is the evaluation module fully functional ? I saw that some issues are still open on it. Also, I initially thought I would write the query log and expected results set by hand for some wikipedia articles but realize now that you have a point as we need to test on a large number of articles. -Regards -Aarsh On Tue, Mar 4, 2014 at 5:26 PM, Olly Betts <olly at survex.com> wrote:> On Sat, Mar 01, 2014 at 10:12:36AM +0530, Aarsh Shah wrote: > > I am thinking of working on the following ideas for my GSOC proposal > > based on my discussions with Olly and my own understanding. Rather > > than focusing on an entire perftest module, I have decided to focus on > > implementing performance tests for weighting schemes based on a > > wikipedia dump and in addition to that, build a framework to measure > > the accuracy and relevance of new and old weighting schemes. > > I mentioned this on IRC (not sure if it was before or after you sent > this mail), but for the benefit of anyone reading who wasn't on IRC > then, we do already have an evaluation module which was originally > written by Andy MacFarlane, and further worked on by Gaurav Arora: > > https://github.com/samuelharden/xapian-evaluation > > > * Measuring the relevance and accuracy of weighting schemes.* > > > > - The accuracy of a weighting scheme can be measured by using the > > concepts of precision and recall. :- > > http://en.wikipedia.org/wiki/Precision_and_recall > > - Once we have the static wikipedia dump in place, we can hardcode > > expected results for each query we plan to run on the data set. > > How would you get a list of suitable queries to run against a wikipedia > dump? I've not seen public query logs for wikipedia. > > How would you get the "expected results for each query"? Producing a > set of relevance judgements is rather time consuming. If the relevance > judgements are poor quality, the conclusions of the evaluation become > untrustworthy. > > I suspect it would be better to use an existing dataset which included > queries and relevance judgements - Parth might know if there's one we > could use. > > > *Profiling and Optimizing Weighting/Query Expansion Schemes* > > > > - Profile DFR schemes and identify/optimize bottlenecks. > > - Profile Stemming algorithms and indexing . > > - For profiling most searches which are fast, valgrind based profilers > > can be used.However, perf can be brought in for slower searches as we > had > > discussed that valgrind based profilers may not be efficient for IO > bound > > tasks. > > - The speed will first be tested using the Realtime:now function and > > then the profiler will be brought in if the speed appears to be too > slow. > > - As mentioned on the ideas page too, a lot of the optimization > can/will > > happen by mapping the forumals used to a smaller set of formulas and > reduce > > the number of times computationally heavy operations such as log() > are used. > > - Create a huge static data-set, preferably a Wikipedia dump. > > - Test the speed of the DFR schemes against the speed of BM25 and > decide > > on a default weighting scheme. Our best bet would be a parameter free > DPH > > schemes as the performance of the one with parameters depends on the > input > > data too. > > - Similarly, a speed analysis of query expansion scheme will also be > > done to decide on a default query expansion scheme.These can be > optimized > > too. > > > > I am not quite being able to decide on an ideal patch for the > idea > > .Please can you suggest some ideas for an ideal patch as an initial first > > step to include with my proposal ? > > I'd suggest trying out profiling something, to get a feel for how the > profiling tools work, and for how long the process of finding a > bottleneck and fixing it takes. > > Cheers, > Olly >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140304/ac0c3da0/attachment-0002.html>