On Tuesday 31 May 2005 03:34, Zed A. Shaw wrote:> Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene > performance analysis.I take it that you''ve spend a lot of time with Lucene as well as Ruby/Odeum. Apart from performance, did you have a look at the quality of their search results? Michael -- Michael Schuerig Face reality and stare it down mailto:michael-q5aiKMLteq4b1SvskN2V4Q@public.gmane.org --Jethro Tull, Silver River Turning http://www.schuerig.de/michael/
Zed, Hi - your Ruby/Odeum vs. Lucene work is of great interest to me (being the author of "Lucene in Action" and an active Lucene committer and all, and now delving more and more into Ruby :)... I had read your first entry a week or so ago and I''ve just quickly read this latest one. I looked at some of your source code, but I did not see code that built the Lucene index - I''m curious how you''ve built it so that I can see why its size is that much larger than the Odeum one. I cannot say anything about your code other than the Java code. You''re creating 126 IndexSearcher instances, which is most definitely a red flag in terms of memory usage. While you are closing the searcher, I''m wondering if perhaps the JVM garbage collector does not get a chance to free the memory because of the tight loops you''ve got going on. In production Java applications using Lucene, only a single IndexSearcher instance is needed to serve any number of threads. What happens to your memory usage if you construct the IndexSearcher once and reuse the same instance within the 126 iterations? I''m completely new to Odeum, so I cannot say how query parsing compares to Lucene. But your test is to search for "sprintf". If the search is that trivial, rather than use QueryParser to create a Query use this: new TermQuery(new Term("contents", "sprintf")) - and also do this one time rather than reconstructing objects within the loop. I don''t think QueryParser is slow, but it is a pretty sophisticated JavaCC-based parser under the covers and certainly is doing a fair bit more work than it needs just to make a "sprintf" TermQuery. It is recommended in Lucene best practices that QueryParser be used for human-entered queries, but for machine generated queries that they be constructed directly with the API. While you''ve obviously worked hard on smoothing out the JVM startup time, it is notoriously heavy. There are alternatives that would be considered such as wrapping a server process (some type of web service perhaps) around the index searching. Ruby could then be used from the command-line to inquire against a search server. This is just one idea to have a long-running Java process where things like HotSpot and garbage collection really get a chance to play out as they were designed to do. One final comment on your Java code, you''ve got a doc.get("path") in a tight loop - this is notably a place in Lucene where I/O is done. You do not do anything with this "path" field. In most search systems I''ve worked with, iterating over all hits for a single search is not desirable (and certainly not advisable). What use is the 10 millionth document to a user? Presenting the best scoring documents first and letting a user navigate further if needed is how my applications work. You mentioned using Lucene 1.4.3, yet your script uses a JAR with a 1.5-dev suffix - the current Subversion of Lucene builds with a 1.9- dev suffix so I''m curious about the version of Lucene you''re really using. My hunch is that you''ve built it from the source distribution of Lucene 1.4.3 which does build with a 1.5-dev suffix by default I believe. Please take my comments in a well meaning friendly way - I would love to have a RubyLucene, but I''m also pragmatic and if there Ruby/Odeum is as good or better than Java Lucene then I''m most definitely interested in trying it out. I would be interested in seeing how Ruby/Odeum compares in terms of document ranking, which is where Lucene really shines, and also how it compares with more sophisticated boolean queries. Thanks, Erik http://www.lucenebook.com On May 30, 2005, at 9:34 PM, Zed A. Shaw wrote:> Hi Everyone, > > Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene > performance analysis. > > http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html > > This analysis is much more detailed, covers a consistent analysis > method, and is probably really really boring for most people. Lots of > pretty graphs and tables of data. There is an Executive Summary for > people who are not interested in the full details. The gist of the > analysis is that Ruby/Odeum is slower than Lucene, but uses a lot less > memory. > > I''m interested in grammar corrections, questions about terminology, > and > other comments. Please send me your comments rather than the list > as I > think most people would consider grammar corrections off-topic. I''ll > give everyone who comments credit on the essay (with links to your > blog > or site). > > Thanks for your time. > > Zed A. Shaw > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Hi Everyone, Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene performance analysis. http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html This analysis is much more detailed, covers a consistent analysis method, and is probably really really boring for most people. Lots of pretty graphs and tables of data. There is an Executive Summary for people who are not interested in the full details. The gist of the analysis is that Ruby/Odeum is slower than Lucene, but uses a lot less memory. I''m interested in grammar corrections, questions about terminology, and other comments. Please send me your comments rather than the list as I think most people would consider grammar corrections off-topic. I''ll give everyone who comments credit on the essay (with links to your blog or site). Thanks for your time. Zed A. Shaw
On Tue, 2005-05-31 at 03:10 +0200, Michael Schuerig wrote:> On Tuesday 31 May 2005 03:34, Zed A. Shaw wrote: > > Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene > > performance analysis. > > I take it that you''ve spend a lot of time with Lucene as well as > Ruby/Odeum. Apart from performance, did you have a look at the quality > of their search results? >Nope, that would violate my rule of confounding mentioned: http://www.zedshaw.com/blog/programming/programmer_stats.html "If you want to measure something, then don’t measure other shit." :-) On a more serious note, someone familiar with Lucene has expressed an interest in comparing ranking algorithms and results.> Michael >
On May 30, 2005, at 11:26 PM, Zed A. Shaw wrote:> On Tue, 2005-05-31 at 03:10 +0200, Michael Schuerig wrote: > >> On Tuesday 31 May 2005 03:34, Zed A. Shaw wrote: >> >>> Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene >>> performance analysis. >>> >> >> I take it that you''ve spend a lot of time with Lucene as well as >> Ruby/Odeum. Apart from performance, did you have a look at the >> quality >> of their search results? >> >> > > Nope, that would violate my rule of confounding mentioned: > > http://www.zedshaw.com/blog/programming/programmer_stats.html > > "If you want to measure something, then don´t measure other shit." > > :-) > > On a more serious note, someone familiar with Lucene has expressed an > interest in comparing ranking algorithms and results.I''m looking forward to those comparisons. I would encourage those comparisons to try various types of queries also - simply searching for "sprintf" is one base for comparison but Lucene supports sophisticated boolean queries (as well as fuzzy, wildcard, range, phrase, and span queries). I''m unfamiliar with Odeum - I saw that it supports AND/OR/NOT - does it also support phrase queries? If so, could you point me to some documentation about that support? Erik
I just thought I would throw this out there, for intellectual stimulation rather than fanning or triggering a flamewar. WikiPedia has migrated from Java/Lucene to Mono/dotLucene. http://www.redmonk.com/sogrady/archives/000720.html I''m curious how the performance of this combination compares to Ruby/Odeum and Java/Lucene. Cheers, Kevin
Wow, that''s kind of bizarre for some reason. First I didn''t know they were Java. Second, I wonder how they decided to drop Java. Neat. Zed On Thu, 2005-06-02 at 09:20 -0600, Kevin Williams wrote:> I just thought I would throw this out there, for intellectual stimulation > rather than fanning or triggering a flamewar. > > WikiPedia has migrated from Java/Lucene to Mono/dotLucene. > http://www.redmonk.com/sogrady/archives/000720.html > > I''m curious how the performance of this combination compares to Ruby/Odeum > and Java/Lucene. > > Cheers, > > Kevin > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > >
On Jun 3, 2005, at 5:00 AM, Zed A. Shaw wrote:> Wow, that''s kind of bizarre for some reason. First I didn''t know they > were Java. Second, I wonder how they decided to drop Java. Neat.I knew they had switched to Lucene and Java a few months ago. I don''t have any knowledge of why they went to Mono. Maybe it is because Java is a big fat pig? :P For some Java Lucene Wikipedia goodness, check this out: http:// searchmorph.com/kat/wikipedia.jsp Do a search and then navigate around with the "similar", "more like this", etc links to see some interesting connections between pages. Erik> > Zed > > On Thu, 2005-06-02 at 09:20 -0600, Kevin Williams wrote: > >> I just thought I would throw this out there, for intellectual >> stimulation >> rather than fanning or triggering a flamewar. >> >> WikiPedia has migrated from Java/Lucene to Mono/dotLucene. >> http://www.redmonk.com/sogrady/archives/000720.html >> >> I''m curious how the performance of this combination compares to >> Ruby/Odeum >> and Java/Lucene. >> >> Cheers, >> >> Kevin >> _______________________________________________ >> Rails mailing list >> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >> http://lists.rubyonrails.org/mailman/listinfo/rails >> >> >> > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
In terms of JVM startup time, I have seen some people write custom classloaders to reduce the number of classes initially loaded - that said a slow startup time really should be a problem for production applications. As for the memory usage, remember that Java will pre-allocate a large chunk of what its allowed to use to speed up later use, so it''s not always easy to see what is really being used, and what is reserved for later use (if the memory isn''t really being used you could reduce the available memory thereby freeing it up for other processes. I''d be interested to see the performance of the Perl Lucene port - Plucene. Calling Perl from Ruby should be easy. sam On 6/3/05, Erik Hatcher <erik-LIifS8st6VgJvtFkdXX2HpqQE7yCjDx5@public.gmane.org> wrote:> > On Jun 3, 2005, at 5:00 AM, Zed A. Shaw wrote: > > Wow, that''s kind of bizarre for some reason. First I didn''t know they > > were Java. Second, I wonder how they decided to drop Java. Neat. > > I knew they had switched to Lucene and Java a few months ago. I > don''t have any knowledge of why they went to Mono. Maybe it is > because Java is a big fat pig? :P > > For some Java Lucene Wikipedia goodness, check this out: http:// > searchmorph.com/kat/wikipedia.jsp Do a search and then navigate > around with the "similar", "more like this", etc links to see some > interesting connections between pages. > > Erik > > > > > > > Zed > > > > On Thu, 2005-06-02 at 09:20 -0600, Kevin Williams wrote: > > > >> I just thought I would throw this out there, for intellectual > >> stimulation > >> rather than fanning or triggering a flamewar. > >> > >> WikiPedia has migrated from Java/Lucene to Mono/dotLucene. > >> http://www.redmonk.com/sogrady/archives/000720.html > >> > >> I''m curious how the performance of this combination compares to > >> Ruby/Odeum > >> and Java/Lucene. > >> > >> Cheers, > >> > >> Kevin > >> _______________________________________________ > >> Rails mailing list > >> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >> http://lists.rubyonrails.org/mailman/listinfo/rails > >> > >> > >> > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >-- sam http://www.magpiebrain.com/
On Jun 5, 2005, at 8:38 AM, Sam Newman wrote:> I''d be interested to see the performance of the Perl Lucene port - > Plucene. Calling Perl from Ruby should be easy.PLucene does not have great performance, from what I''ve heard. The best port to use for consideration against Ruby/Odeum is PyLucene - it''s performance is very good. The creator of it has done some comparisons and has shown it to be faster than Java Lucene. There is already the same application available in both Java and Python, the first example we show in "Lucene in Action". You can download the chapter that details it, along with the code from http:// www.lucenebook.com for Java. And for PyLucene you can get these same examples here: http://svn.osafoundation.org/pylucene/trunk/samples/ LuceneInAction/ (main site: http://pylucene.osafoundation.org/) - Indexer.py and Searcher.py. I would be happy to see folks with time and interest to port these same applications to Ruby/Odeum and see how they compare. Better yet, I''d love for someone with great know how in the GCJ/SWIG arena to build RubyLucene in a manner similar to PyLucene. There is a ruby- dev-PPu3vs9EauNd/SJB6HiN2Ni2O/JbrIOy@public.gmane.org list where some folks are discussing and starting to implement this very thing. Please subscribe using ruby- dev-subscribe-PPu3vs9EauNd/SJB6HiN2Ni2O/JbrIOy@public.gmane.org and help out. Erik
Hi Zed, Just wanted to quickly point out that your power calculation is incorrect - you''re analysing a random effects model, with burst and trial nested within burst, so you need to use a power calculation appropriate for that. A straight multiplication of the two numbers is not correct. Hadley On 5/30/05, Zed A. Shaw <zedshaw-dd7LMGGEL7NBDgjK7y7TUQ@public.gmane.org> wrote:> Hi Everyone, > > Just wanted to send out an update on my recent Ruby/Odeum vs. Lucene > performance analysis. > > http://www.zedshaw.com/projects/ruby_odeum/odeum_lucene_part2.html > > This analysis is much more detailed, covers a consistent analysis > method, and is probably really really boring for most people. Lots of > pretty graphs and tables of data. There is an Executive Summary for > people who are not interested in the full details. The gist of the > analysis is that Ruby/Odeum is slower than Lucene, but uses a lot less > memory. > > I''m interested in grammar corrections, questions about terminology, and > other comments. Please send me your comments rather than the list as I > think most people would consider grammar corrections off-topic. I''ll > give everyone who comments credit on the essay (with links to your blog > or site). > > Thanks for your time. > > Zed A. Shaw > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >