hi, with current state of ferret, can anyone compare speeds of mysql fulltext search vs. ferret indexing search. and do I have to query db after taking results from ferret? thanks in advance _______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Hi Onur, I can''t offer any input on speed comparisons between Ferret and MySQL fulltext search. I will say this though. If the results that MySQL fulltext search returns are good enough then use it. But if you care about the relevancy of your results and you want to be able to run advanced queries like boolean queries or phrase queries, you''ll want to go with Ferret, and it should be fast enough. As for having to query the database, that will depend how you want to use Ferret. You can store the data in the Ferret index if you like, in which case you won''t have to query the database. I think it''s better just to keep the data in one spot though. HTH, Dave On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> hi, > with current state of ferret, can anyone compare speeds of mysql fulltext > search vs. ferret indexing search. and do I have to query db after taking > results from ferret? > thanks in advance > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >
Hi, Ferret is not only faster (as I have benchmarked a few times) as data gets larger but its also more accurate because of its query analyser (you can use google tike search query''s). There are two options, you can store everything in ferret (and not need a database anymore) or store only the index (fields you need to index) and retrieve the other value''s from mysql. At this moment I am trying to write a better plugin for ferret so you can specify what needs to be index, use the find (instead of an special method) with additional options. And automaticly query database for additional fields. Onur Turgay wrote:> hi, > with current state of ferret, can anyone compare speeds of mysql > fulltext search vs. ferret indexing search. and do I have to query db > after taking results from ferret? > thanks in advance > >------------------------------------------------------------------------ > >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > >
On 12/13/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:> Hi, > > Ferret is not only faster (as I have benchmarked a few times) as data > gets larger but its also more accurate because of its query analyser > (you can use google tike search query''s).This is great to know. I''m surprised. Ferret is going to by much much faster soon. I''m rewriting it all in C.> > At this moment I am trying to write a better plugin for ferret so you > can specify what needs to be index, use the find (instead of an special > method) with additional options. And automaticly query database for > additional fields.Please keep us updated as to how this is going. I''d like to add more stuff like this to the Ferret Wiki. You might like to look at this page if you haven''t already; http://ferret.davebalmain.com/trac/wiki/FerretOnRails Far from a perfect solution so please feel free to add to it. :-) Cheers, Dave> Onur Turgay wrote: > > > hi, > > with current state of ferret, can anyone compare speeds of mysql > > fulltext search vs. ferret indexing search. and do I have to query db > > after taking results from ferret? > > thanks in advance > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Rails mailing list > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
thanks all for the great work. On 12/13/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > On 12/13/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote: > > Hi, > > > > Ferret is not only faster (as I have benchmarked a few times) as data > > gets larger but its also more accurate because of its query analyser > > (you can use google tike search query''s). > > This is great to know. I''m surprised. Ferret is going to by much much > faster soon. I''m rewriting it all in C. > > > > > At this moment I am trying to write a better plugin for ferret so you > > can specify what needs to be index, use the find (instead of an special > > method) with additional options. And automaticly query database for > > additional fields. > > Please keep us updated as to how this is going. I''d like to add more > stuff like this to the Ferret Wiki. You might like to look at this > page if you haven''t already; > > http://ferret.davebalmain.com/trac/wiki/FerretOnRails > > Far from a perfect solution so please feel free to add to it. :-) > > Cheers, > Dave > > > Onur Turgay wrote: > > > > > hi, > > > with current state of ferret, can anyone compare speeds of mysql > > > fulltext search vs. ferret indexing search. and do I have to query db > > > after taking results from ferret? > > > thanks in advance > > > > > > >------------------------------------------------------------------------ > > > > > >_______________________________________________ > > >Rails mailing list > > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Hi David, I thinks you should be carefull replacing ''ferret'' as a database till its really mature. (Indexes can be recreated anytime with the original data). Mysql has proven itself as a mature database sollution and has many tools for maintaining and managing. Ferret in my opinion can''t replace that (I don''t even think lucene can). It lacks certain management tools that are needed for a database, however current databases lack advanced query parsers (and thats good because it only makes the database complexer). I know about linking lucene to existing databases with very good result, this should be possible with ferret or not? David Balmain wrote:>Hi Onur, > >I can''t offer any input on speed comparisons between Ferret and MySQL >fulltext search. I will say this though. If the results that MySQL >fulltext search returns are good enough then use it. But if you care >about the relevancy of your results and you want to be able to run >advanced queries like boolean queries or phrase queries, you''ll want >to go with Ferret, and it should be fast enough. > >As for having to query the database, that will depend how you want to >use Ferret. You can store the data in the Ferret index if you like, in >which case you won''t have to query the database. I think it''s better >just to keep the data in one spot though. > >HTH, >Dave > >On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > >>hi, >>with current state of ferret, can anyone compare speeds of mysql fulltext >>search vs. ferret indexing search. and do I have to query db after taking >>results from ferret? >>thanks in advance >> >>_______________________________________________ >>Rails mailing list >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >>http://lists.rubyonrails.org/mailman/listinfo/rails >> >> >> >> >> >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > > >
I think storing data only in ferret is a bad idea as tables have relations with other tables etc. On 12/13/05, David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> > Hi Onur, > > I can''t offer any input on speed comparisons between Ferret and MySQL > fulltext search. I will say this though. If the results that MySQL > fulltext search returns are good enough then use it. But if you care > about the relevancy of your results and you want to be able to run > advanced queries like boolean queries or phrase queries, you''ll want > to go with Ferret, and it should be fast enough. > > As for having to query the database, that will depend how you want to > use Ferret. You can store the data in the Ferret index if you like, in > which case you won''t have to query the database. I think it''s better > just to keep the data in one spot though. > > HTH, > Dave > > On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > hi, > > with current state of ferret, can anyone compare speeds of mysql > fulltext > > search vs. ferret indexing search. and do I have to query db after > taking > > results from ferret? > > thanks in advance > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >_______________________________________________ Rails mailing list Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org http://lists.rubyonrails.org/mailman/listinfo/rails
Agreed. I meant it''s probably not worth storing the data in Ferret. Just use it for the indexing and keep your data in the database. ((On a side note, it is possible for some applications to do away with the database and use Ferret as the only data store. I think that''s how Erik Hatcher''s blog software Blogscene works.)) On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:> I think storing data only in ferret is a bad idea as tables have relations > with other tables etc. > > On 12/13/05, David Balmain < dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > Hi Onur, > > > > I can''t offer any input on speed comparisons between Ferret and MySQL > > fulltext search. I will say this though. If the results that MySQL > > fulltext search returns are good enough then use it. But if you care > > about the relevancy of your results and you want to be able to run > > advanced queries like boolean queries or phrase queries, you''ll want > > to go with Ferret, and it should be fast enough. > > > > As for having to query the database, that will depend how you want to > > use Ferret. You can store the data in the Ferret index if you like, in > > which case you won''t have to query the database. I think it''s better > > just to keep the data in one spot though. > > > > HTH, > > Dave > > > > On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > hi, > > > with current state of ferret, can anyone compare speeds of mysql > fulltext > > > search vs. ferret indexing search. and do I have to query db after > taking > > > results from ferret? > > > thanks in advance > > > > > > _______________________________________________ > > > Rails mailing list > > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > > > > _______________________________________________ > > Rails mailing list > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > > http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails > > >
On 12/13/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:> Hi David, > > I thinks you should be carefull replacing ''ferret'' as a database till > its really mature. (Indexes can be recreated anytime with the original > data). Mysql has proven itself as a mature database sollution and has > many tools for maintaining and managing. Ferret in my opinion can''t > replace that (I don''t even think lucene can). It lacks certain > management tools that are needed for a database, however current > databases lack advanced query parsers (and thats good because it only > makes the database complexer). I know about linking lucene to existing > databases with very good result, this should be possible with ferret or not?Sure. I wouldn''t replace a database with Ferret in most instances and probably not in a Rails app since rails makes it so easy to use a database. I was just trying to say it was possible to use Ferret or Lucene as a data store. :-)> David Balmain wrote: > > >Hi Onur, > > > >I can''t offer any input on speed comparisons between Ferret and MySQL > >fulltext search. I will say this though. If the results that MySQL > >fulltext search returns are good enough then use it. But if you care > >about the relevancy of your results and you want to be able to run > >advanced queries like boolean queries or phrase queries, you''ll want > >to go with Ferret, and it should be fast enough. > > > >As for having to query the database, that will depend how you want to > >use Ferret. You can store the data in the Ferret index if you like, in > >which case you won''t have to query the database. I think it''s better > >just to keep the data in one spot though. > > > >HTH, > >Dave > > > >On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > >>hi, > >>with current state of ferret, can anyone compare speeds of mysql fulltext > >>search vs. ferret indexing search. and do I have to query db after taking > >>results from ferret? > >>thanks in advance > >> > >>_______________________________________________ > >>Rails mailing list > >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >>http://lists.rubyonrails.org/mailman/listinfo/rails > >> > >> > >> > >> > >> > >_______________________________________________ > >Rails mailing list > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
David, Are you trying to make a lucene compatible project? or a similar project? Because I think with the possibilities of ruby, in time it would be possible to go beyond what possible in java.. Really great project, I hope to be able to contribute, my C skill are a little old (10 years orso) maybe I can help you out on the ruby end for improvements... David Balmain wrote:>Agreed. I meant it''s probably not worth storing the data in Ferret. >Just use it for the indexing and keep your data in the database. > >((On a side note, it is possible for some applications to do away with >the database and use Ferret as the only data store. I think that''s how >Erik Hatcher''s blog software Blogscene works.)) > >On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > >>I think storing data only in ferret is a bad idea as tables have relations >>with other tables etc. >> >>On 12/13/05, David Balmain < dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> >> >>>Hi Onur, >>> >>>I can''t offer any input on speed comparisons between Ferret and MySQL >>>fulltext search. I will say this though. If the results that MySQL >>>fulltext search returns are good enough then use it. But if you care >>>about the relevancy of your results and you want to be able to run >>>advanced queries like boolean queries or phrase queries, you''ll want >>>to go with Ferret, and it should be fast enough. >>> >>>As for having to query the database, that will depend how you want to >>>use Ferret. You can store the data in the Ferret index if you like, in >>>which case you won''t have to query the database. I think it''s better >>>just to keep the data in one spot though. >>> >>>HTH, >>>Dave >>> >>>On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >>> >>> >>>>hi, >>>>with current state of ferret, can anyone compare speeds of mysql >>>> >>>> >>fulltext >> >> >>>>search vs. ferret indexing search. and do I have to query db after >>>> >>>> >>taking >> >> >>>>results from ferret? >>>>thanks in advance >>>> >>>>_______________________________________________ >>>>Rails mailing list >>>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >>>>http://lists.rubyonrails.org/mailman/listinfo/rails >>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Rails mailing list >>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >>>http://lists.rubyonrails.org/mailman/listinfo/rails >>> >>> >>> >>_______________________________________________ >>Rails mailing list >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >>http://lists.rubyonrails.org/mailman/listinfo/rails >> >> >> >> >> >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > > >
On 12/13/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:> David, > > Are you trying to make a lucene compatible project? or a similar > project? Because I think with the possibilities of ruby, in time it > would be possible to go beyond what possible in java..Very good question. At the moment I''m trying to stay compatible. But if I get enough contributers I''ll consider forking off. Lucene is quite a large project with a lot of contributers so it might be hard to push ahead of them.> Really great project, I hope to be able to contribute, my C skill are a > little old (10 years orso) maybe I can help you out on the ruby end for > improvements...Any help is appreciated. Just recommending Ferret is going to help the project in the long run so I thank you for that. Also contributing to the wiki is very important. Thanks, Dave> David Balmain wrote: > > >Agreed. I meant it''s probably not worth storing the data in Ferret. > >Just use it for the indexing and keep your data in the database. > > > >((On a side note, it is possible for some applications to do away with > >the database and use Ferret as the only data store. I think that''s how > >Erik Hatcher''s blog software Blogscene works.)) > > > >On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > > > >>I think storing data only in ferret is a bad idea as tables have relations > >>with other tables etc. > >> > >>On 12/13/05, David Balmain < dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >> > >> > >>>Hi Onur, > >>> > >>>I can''t offer any input on speed comparisons between Ferret and MySQL > >>>fulltext search. I will say this though. If the results that MySQL > >>>fulltext search returns are good enough then use it. But if you care > >>>about the relevancy of your results and you want to be able to run > >>>advanced queries like boolean queries or phrase queries, you''ll want > >>>to go with Ferret, and it should be fast enough. > >>> > >>>As for having to query the database, that will depend how you want to > >>>use Ferret. You can store the data in the Ferret index if you like, in > >>>which case you won''t have to query the database. I think it''s better > >>>just to keep the data in one spot though. > >>> > >>>HTH, > >>>Dave > >>> > >>>On 12/13/05, Onur Turgay <onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > >>> > >>> > >>>>hi, > >>>>with current state of ferret, can anyone compare speeds of mysql > >>>> > >>>> > >>fulltext > >> > >> > >>>>search vs. ferret indexing search. and do I have to query db after > >>>> > >>>> > >>taking > >> > >> > >>>>results from ferret? > >>>>thanks in advance > >>>> > >>>>_______________________________________________ > >>>>Rails mailing list > >>>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >>>>http://lists.rubyonrails.org/mailman/listinfo/rails > >>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Rails mailing list > >>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >>>http://lists.rubyonrails.org/mailman/listinfo/rails > >>> > >>> > >>> > >>_______________________________________________ > >>Rails mailing list > >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >>http://lists.rubyonrails.org/mailman/listinfo/rails > >> > >> > >> > >> > >> > >_______________________________________________ > >Rails mailing list > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
On Dec 13, 2005, at 8:04 AM, David Balmain wrote:> ((On a side note, it is possible for some applications to do away with > the database and use Ferret as the only data store. I think that''s how > Erik Hatcher''s blog software Blogscene works.))If only I had that e-mail-to-blog gateway, I''d be blogging all the time! Yes, http://www.blogscene.org/erik is powered entirely by a Lucene index, a servlet, and some Velocity templates. The original blog entries reside in blosxom-style text files, but at runtime only Lucene is used. It really depends on the scenario, but in general I don''t recommend using Lucene (or Ferret) as the definitive data source. The primary reason is that an index is optimized for how it is going to be searched, and you may later want to change how text is tokenized and thus what terms are indexed. Having the original data around to be able to re-index with different settings is a good thing. It''s also possible to store the original data in Lucene and pull it out for reindexing purposes - but that is trickier. Erik
On Dec 13, 2005, at 9:30 AM, Abdur-Rahman Advany wrote:> Are you trying to make a lucene compatible project? or a similar > project? Because I think with the possibilities of ruby, in time it > would be possible to go beyond what possible in java..Could you elaborate in what ways you feel Ferret could go beyond what is possible with Java Lucene? How does Java hold Lucene back? Genuinely curious, Erik
Erik, I am sorry, I just exited about ruby in general. But I thing with language like ruby and a project like lucene, it’s my personal opinion that LOC makes a difference. Things like mixins and the way ruby you program in ruby makes things just a bit easier. I took me 4/5 days to understand and work with lucene (great book b.t.w.) and it only took me a 10 days to learn most of edge rails and many other plugins by reading code (yes not docs, code LOL)... Lucene is a great product, and will continue on java (you can''t kill java, its really usable for many things). But ruby just makes it easy to program, and with the integration with c. Well things are optimized. I have only been rubying for a day or 20. But it amazes my howmuch a language can make a difference... So I have to revise my statement a bit, but I think, in time, melting Ferret and ActiveRecord together could make it a better product then lucene : ) But that future talk... Well, I am amazed to see you here : ) what is your opinion? Abdur-Rahman Erik Hatcher wrote:> > On Dec 13, 2005, at 9:30 AM, Abdur-Rahman Advany wrote: > >> Are you trying to make a lucene compatible project? or a similar >> project? Because I think with the possibilities of ruby, in time it >> would be possible to go beyond what possible in java.. > > > Could you elaborate in what ways you feel Ferret could go beyond what > is possible with Java Lucene? How does Java hold Lucene back? > > Genuinely curious, > Erik > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
On Dec 13, 2005, at 11:28 AM, Abdur-Rahman Advany wrote:> I am sorry, I just exited about ruby in general. But I thing with > language like ruby and a project like lucene, it’s my personal > opinion that LOC makes a difference. Things like mixins and the way > ruby you program in ruby makes things just a bit easier. I took me > 4/5 days to understand and work with lucene (great book b.t.w.) and > it only took me a 10 days to learn most of edge rails and many > other plugins by reading code (yes not docs, code LOL)...It''s not quite comparable the difference between a full-text search engine and a web framework. Lucene is optimized heavily - it''s code is more C-like than Java- like. Making Lucene more OO or taking advantage of all the fancy Ruby ways of method trickery is likely to slow things down. The entire idea of a full-text search engine is to be fast! (oh, and to be easy on resources as well)> Lucene is a great product, and will continue on java (you can''t > kill java, its really usable for many things). But ruby just makes > it easy to program, and with the integration with c. Well things > are optimized. I have only been rubying for a day or 20. But it > amazes my howmuch a language can make a difference...The folks that would be coding under the covers of Ferret or Lucene are a highly specialized group of folks. Likewise with the core code of Rails. Most users don''t need to see what is underneath - it just works. Indeed the language makes a difference, but also the goal of the effort. A full-text search engine has some very specialized needs and even the most basic data structures in high level languages like Hash and Array are only used if they are fast enough, otherwise alternatives are created. This is definitely the case with Lucene.> So I have to revise my statement a bit, but I think, in time, > melting Ferret and ActiveRecord together could make it a better > product then lucene : ) But that future talk...Well, in all fairness to Lucene, it is orthogonal to the database concern entirely. Of course Ferret + ActiveRecord > just Lucene, but to make the comparison more fair, how about Lucene + Hibernate? There are hooks for Hibernate to index with Lucene, even using Java annotations to mark the fields to be indexed, and how they are to be indexed. I see ActiveRecord + Ferret to be a great path to go, and the acts_as_ferret initial implementation is on the right track. I hope to delve into this area more myself in the future (though my work does not currently involve relational databases, but will soon).> Well, I am amazed to see you here : ) what is your opinion?I''ve been a Ruby fan for ages, ever since catching a Dave Thomas presentation in ''02. I''ve dreamed of RubyLucene for years, creating the rubylucene (formerly rucene) project at RubyForge once upon a time but not doing much with it beyond some low-level I/O proof of concept tests. I''m ecstatic that Ferret exists! I do have some reservations on the effort to port it all to C, as I''d really like the effort to aim towards the architecture PyLucene has, where it uses GCJ against Java Lucene, and then wraps it, using SWIG, into a Pythonic API. In order to avoid porting every time Java Lucene changes (which is where the guru creator Doug Cutting spends his effort), it would be a simple recompilation (and perhaps some API glue). Erik
I just got done reviewing some of the info in the ferret wiki. It looks like some great work - thanks! I''m building an app that is going have some search capability and I was planning on using mysql with fulltext searches, but looking at ferret has got me wondering if there might not be a better way. Specifically, I was wondering about the idea of using an in memory index for increasing the speed of searches. The data i''m storing will be most utilized when it is relatively new. After it''s a few days old, people won''t need it as much. So putting all this data in the same database may not make sense (if it''s relatively easy to split it into ''fresh'' and ''stale'' databases). Would it make sense to consider using an in-memory cache of documents for the newest data while having a disk-based index for when people want to search for older documents? Or would the performance gains not be worth the effort? -kevin
I just wanted to add that I think the ideal solution would be for me to be able to define a single index that did both -- that is, that would cache documents in memory while keeping full index in disk. It would be great as well if I could specify how I wanted the cache to work -- say, by giving it a regular expression or some query to tell it what should be cached in memory. Maybe I could also specify a limit on the total memory it should use for cache. I might, for example, want to have it cache documents based on a certain user or customer id rather than cache them by date. Maybe whenever a new user logs in I modify the cache settings to include their documents in the cache -- and whenever someone logs out I flush theirs. The value of this is that it hides the complexity from developers/users and makes it easy to use. Sorry for the ''stream of consciousness'' design reqs -- I''m just dumping the idea now since I was thinking about it...
Erik Hatcher wrote:> It''s not quite comparable the difference between a full-text search > engine and a web framework. > > Lucene is optimized heavily - it''s code is more C-like than Java- > like. Making Lucene more OO or taking advantage of all the fancy > Ruby ways of method trickery is likely to slow things down. The > entire idea of a full-text search engine is to be fast! (oh, and to > be easy on resources as well)The java version is really heavy a.t.m. (just to mention it ;)), but your quite right, search querie''s can''t be cached very easily. So writing optimized code is very important.> Well, in all fairness to Lucene, it is orthogonal to the database > concern entirely. Of course Ferret + ActiveRecord > just Lucene, but > to make the comparison more fair, how about Lucene + Hibernate? > There are hooks for Hibernate to index with Lucene, even using Java > annotations to mark the fields to be indexed, and how they are to be > indexed. I see ActiveRecord + Ferret to be a great path to go, and > the acts_as_ferret initial implementation is on the right track. I > hope to delve into this area more myself in the future (though my > work does not currently involve relational databases, but will soon).I am busy at the moment to create a plugin for rails, but ill be easy to use to extend ActiveRecord. I am trying combine the database and Ferret with a news methods that builds upon find (search), just ferret if a query is present and fetch the rows using find.> I''ve been a Ruby fan for ages, ever since catching a Dave Thomas > presentation in ''02. I''ve dreamed of RubyLucene for years, creating > the rubylucene (formerly rucene) project at RubyForge once upon a > time but not doing much with it beyond some low-level I/O proof of > concept tests. > > I''m ecstatic that Ferret exists! I do have some reservations on the > effort to port it all to C, as I''d really like the effort to aim > towards the architecture PyLucene has, where it uses GCJ against Java > Lucene, and then wraps it, using SWIG, into a Pythonic API. In order > to avoid porting every time Java Lucene changes (which is where the > guru creator Doug Cutting spends his effort), it would be a simple > recompilation (and perhaps some API glue).Thats a very good idea, but compiling java sound weird :). David have you considered this? I wonder how will it would integrate..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 13, 2005, at 6:15 AM, David Balmain wrote:> On 12/13/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote: >> I thinks you should be carefull replacing ''ferret'' as a database till >> its really mature. (Indexes can be recreated anytime with the >> original >> data). Mysql has proven itself as a mature database sollution and has >> many tools for maintaining and managing. Ferret in my opinion can''t >> replace that (I don''t even think lucene can). It lacks certain >> management tools that are needed for a database, however current >> databases lack advanced query parsers (and thats good because it only >> makes the database complexer). I know about linking lucene to >> existing >> databases with very good result, this should be possible with >> ferret or not? > > Sure. I wouldn''t replace a database with Ferret in most instances and > probably not in a Rails app since rails makes it so easy to use a > database. I was just trying to say it was possible to use Ferret or > Lucene as a data store. :-)I treat the data I store in the Ferret index as a denormalized table tuned for the queries it answers. jeremy -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (Darwin) iD8DBQFDnwuCAQHALep9HFYRAvqDAJ9q3QwWgxpjke4XMrxW4tZh4vbsgACfb48b odJNj9m2MkZgyg180o/s9z8=O3sr -----END PGP SIGNATURE-----
On 12/14/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:> Erik Hatcher wrote: > > > It''s not quite comparable the difference between a full-text search > > engine and a web framework. > > > > Lucene is optimized heavily - it''s code is more C-like than Java- > > like. Making Lucene more OO or taking advantage of all the fancy > > Ruby ways of method trickery is likely to slow things down. The > > entire idea of a full-text search engine is to be fast! (oh, and to > > be easy on resources as well) > > The java version is really heavy a.t.m. (just to mention it ;)), but > your quite right, search querie''s can''t be cached very easily. So > writing optimized code is very important. > > > Well, in all fairness to Lucene, it is orthogonal to the database > > concern entirely. Of course Ferret + ActiveRecord > just Lucene, but > > to make the comparison more fair, how about Lucene + Hibernate? > > There are hooks for Hibernate to index with Lucene, even using Java > > annotations to mark the fields to be indexed, and how they are to be > > indexed. I see ActiveRecord + Ferret to be a great path to go, and > > the acts_as_ferret initial implementation is on the right track. I > > hope to delve into this area more myself in the future (though my > > work does not currently involve relational databases, but will soon). > > I am busy at the moment to create a plugin for rails, but ill be easy to > use to extend ActiveRecord. I am trying combine the database and Ferret > with a news methods that builds upon find (search), just ferret if a > query is present and fetch the rows using find. > > > I''ve been a Ruby fan for ages, ever since catching a Dave Thomas > > presentation in ''02. I''ve dreamed of RubyLucene for years, creating > > the rubylucene (formerly rucene) project at RubyForge once upon a > > time but not doing much with it beyond some low-level I/O proof of > > concept tests. > > > > I''m ecstatic that Ferret exists! I do have some reservations on the > > effort to port it all to C, as I''d really like the effort to aim > > towards the architecture PyLucene has, where it uses GCJ against Java > > Lucene, and then wraps it, using SWIG, into a Pythonic API. In order > > to avoid porting every time Java Lucene changes (which is where the > > guru creator Doug Cutting spends his effort), it would be a simple > > recompilation (and perhaps some API glue). > > Thats a very good idea, but compiling java sound weird :). David have > you considered this? I wonder how will it would integrate..Yes, Erik and I have discussed it already. It might be a better way to do it but I can''t find the motivation. It''s a lot more interesting and motivating for me trying to create something that runs faster than Lucene. Besides being slightly faster, C is also lighter on resources and makes for a much smaller download. I was and still am interested in desktop search so these are all important to me. Speaking of Doug Cutting, he has some words to say on this too; http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html So those are my reasons with taking the route I am, and since I''m currently doing the work, I get to choose. ;-) If anyone wants to get stuck into porting the PyLucene stuff I''m more than willing to lend and hand. It''s definitely worth doing but it''s not really my cup of tea.> _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Hi Kevin, I can''t quite tell from your description. Do you actually want to store and retrieve the documents from a Ferret index? Or do you just want to run the search on the index and then retrieve the results from the database? Also, how large a document set are you expecting? If you still have to retrieve the documents from the database I think Ferret should be fine as is without the caching. If you are running into performance problems after it''s implemented I could certainly help you set up some caching. Cheers, Dave On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote:> I just wanted to add that I think the ideal solution would be for me to be able > to define a single index that did both -- that is, that would cache documents > in memory while keeping full index in disk. > > It would be great as well if I could specify how I wanted the cache to work -- > say, by giving it a regular expression or some query to tell it what should be > cached in memory. Maybe I could also specify a limit on the total memory it > should use for cache. > > I might, for example, want to have it cache documents based on a certain user or > customer id rather than cache them by date. Maybe whenever a new user logs in I > modify the cache settings to include their documents in the cache -- and > whenever someone logs out I flush theirs. > > The value of this is that it hides the complexity from developers/users and > makes it easy to use. > > Sorry for the ''stream of consciousness'' design reqs -- I''m just dumping the idea > now since I was thinking about it... > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
> > >Yes, Erik and I have discussed it already. It might be a better way to >do it but I can''t find the motivation. It''s a lot more interesting and >motivating for me trying to create something that runs faster than >Lucene. Besides being slightly faster, C is also lighter on resources >and makes for a much smaller download. I was and still am interested >in desktop search so these are all important to me. Speaking of Doug >Cutting, he has some words to say on this too; > >http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html > >So those are my reasons with taking the route I am, and since I''m >currently doing the work, I get to choose. ;-) If anyone wants to get >stuck into porting the PyLucene stuff I''m more than willing to lend >and hand. It''s definitely worth doing but it''s not really my cup of >tea. >haha : ) wel, your doing a great job, ill continue to use ferret! I don''t have the client request a.t.m. for taking on such a project. Maybe in after a couple of months...
>Yes, Erik and I have discussed it already. It might be a better way to >do it but I can''t find the motivation. It''s a lot more interesting and >motivating for me trying to create something that runs faster than >Lucene. Besides being slightly faster, C is also lighter on resources >and makes for a much smaller download. I was and still am interested >in desktop search so these are all important to me. Speaking of Doug >Cutting, he has some words to say on this too; > >http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html > >So those are my reasons with taking the route I am, and since I''m >currently doing the work, I get to choose. ;-) If anyone wants to get >stuck into porting the PyLucene stuff I''m more than willing to lend >and hand. It''s definitely worth doing but it''s not really my cup of >tea. > > >My kudos for these honest words!! A motivated developer is often the most important thing. Even in this early stage the rails community owes a great deal of compliment to the ongoing efforts on ferret. regards Jan
On 12/14/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org> wrote:> > >Yes, Erik and I have discussed it already. It might be a better way to > >do it but I can''t find the motivation. It''s a lot more interesting and > >motivating for me trying to create something that runs faster than > >Lucene. Besides being slightly faster, C is also lighter on resources > >and makes for a much smaller download. I was and still am interested > >in desktop search so these are all important to me. Speaking of Doug > >Cutting, he has some words to say on this too; > > > >http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html > > > >So those are my reasons with taking the route I am, and since I''m > >currently doing the work, I get to choose. ;-) If anyone wants to get > >stuck into porting the PyLucene stuff I''m more than willing to lend > >and hand. It''s definitely worth doing but it''s not really my cup of > >tea. > > > > > > > My kudos for these honest words!! A motivated developer is often the > most important thing. > > Even in this early stage the rails community owes a great deal of > compliment to the ongoing efforts on ferret.Especially the logos. ;-) Thanks.> regards > Jan > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
I''m not sure yet what''s best. I haven''t built that part of my app yet and am still working through the design. I''m just trying to think through the best approach for now. Do you have pointers to docs that can provide some basic ''rules of thumb'' for design - like when to store docs in a database and run a search on the index -v- when to store docs in the index directly? I used Verity for search on an e-commerce site I helped build a few years ago. We stored the actual docs in a database (product descriptions, actually) but used verity for searching - it worked fine, but was a pain since updating the product catalog tables and the verity search index had to be closely coordinated or you''d find search results for products that weren''t in the database... Also, regarding creating an index in memory -v- creating it on disk -- are there significant performance differences (eg, 20% - 50% faster or more) when using an in-memory index? Has anyone published test results? Thanks again for your help and your efforts. My needs aren''t pressing, I''m just trying to figure out using ferret might benefit the app I''m building. -k Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:> Hi Kevin, > > I can''t quite tell from your description. Do you actually want to > store and retrieve the documents from a Ferret index? Or do you just > want to run the search on the index and then retrieve the results from > the database? Also, how large a document set are you expecting? If you > still have to retrieve the documents from the database I think Ferret > should be fine as is without the caching. If you are running into > performance problems after it''s implemented I could certainly help you > set up some caching. > > Cheers, > Dave >
On Dec 13, 2005, at 12:54 PM, Abdur-Rahman Advany wrote:> Erik Hatcher wrote: > >> It''s not quite comparable the difference between a full-text >> search engine and a web framework. >> >> Lucene is optimized heavily - it''s code is more C-like than Java- >> like. Making Lucene more OO or taking advantage of all the fancy >> Ruby ways of method trickery is likely to slow things down. The >> entire idea of a full-text search engine is to be fast! (oh, and >> to be easy on resources as well) > > The java version is really heavy a.t.m. (just to mention it ;)), > but your quite right, search querie''s can''t be cached very easily. > So writing optimized code is very important.What do you mean by "heavy"? I guess I''m being a bit defensive about Java Lucene. I''m not understanding your negatives to Java Lucene other than your preference for Ruby. It still remains to be seen how performant and optimized Ferret can be compared to Java Lucene. My hunch is that porting to C will make it slightly faster in spots, but whether it is worth the headaches of maintaining the port is my question.>> I''ve been a Ruby fan for ages, ever since catching a Dave Thomas >> presentation in ''02. I''ve dreamed of RubyLucene for years, >> creating the rubylucene (formerly rucene) project at RubyForge >> once upon a time but not doing much with it beyond some low-level >> I/O proof of concept tests. >> >> I''m ecstatic that Ferret exists! I do have some reservations on >> the effort to port it all to C, as I''d really like the effort to >> aim towards the architecture PyLucene has, where it uses GCJ >> against Java Lucene, and then wraps it, using SWIG, into a >> Pythonic API. In order to avoid porting every time Java Lucene >> changes (which is where the guru creator Doug Cutting spends his >> effort), it would be a simple recompilation (and perhaps some API >> glue). > > Thats a very good idea, but compiling java sound weird :). David > have you considered this? I wonder how will it would integrate..PyLucene is *fast*. Super fast. Erik
On Dec 13, 2005, at 1:58 PM, Jan Prill wrote:>> So those are my reasons with taking the route I am, and since I''m >> currently doing the work, I get to choose. ;-) If anyone wants to >> get >> stuck into porting the PyLucene stuff I''m more than willing to lend >> and hand. It''s definitely worth doing but it''s not really my cup of >> tea. >> >> > My kudos for these honest words!! A motivated developer is often > the most important thing. > > Even in this early stage the rails community owes a great deal of > compliment to the ongoing efforts on ferret.Hear hear! Kudos to Dave for Ferret and I fully encourage him to choose the development path he wants to go on. I hope he succeeds in making a faster Lucene, for sure, regardless of what language he creates it for. Erik
On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote:> I''m not sure yet what''s best. I haven''t built that part of my app yet and am > still working through the design. I''m just trying to think through the best > approach for now. Do you have pointers to docs that can provide some basic > ''rules of thumb'' for design - like when to store docs in a database and run a > search on the index -v- when to store docs in the index directly?I don''t know if you caught the other thread on Ferret but as we were discussing, it''s usually better to store the documents in the database and use ferret for finding the relevent documents. In rails, the way to go is probably use something like this; http://ferret.davebalmain.com/trac/wiki/FerretOnRails The main reason you''d store stuff in the index is to allow result searching. For example, if you wanted to sort your search results by create_date then you''d need to store create_date in the index. There are a few other times I can think of that you might want to store documents in an index but they don''t apply to a rails app.> I used Verity for search on an e-commerce site I helped build a few years ago. > We stored the actual docs in a database (product descriptions, actually) but > used verity for searching - it worked fine, but was a pain since updating the > product catalog tables and the verity search index had to be closely > coordinated or you''d find search results for products that weren''t in the > database...You need to be careful of this with Ferret too. This is the problem the acts_as_ferret ActiveRecord hocks are trying to solve. It still requires a bit of work. I haven''t played with rails for a while now but when I get the chance I''ll try and come up with something better.> Also, regarding creating an index in memory -v- creating it on disk -- are there > significant performance differences (eg, 20% - 50% faster or more) when using an > in-memory index? Has anyone published test results? > > Thanks again for your help and your efforts. My needs aren''t pressing, I''m just > trying to figure out using ferret might benefit the app I''m building.This is kind of a catch-22. If you can store your index in memory then it is probably small enough that it won''t need to be stored in memory. With the C version I''m working on the difference is only about 20%-30% so not worth worrying about in my opinion. HTH, Dave> -k > > > Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > > > Hi Kevin, > > > > I can''t quite tell from your description. Do you actually want to > > store and retrieve the documents from a Ferret index? Or do you just > > want to run the search on the index and then retrieve the results from > > the database? Also, how large a document set are you expecting? If you > > still have to retrieve the documents from the database I think Ferret > > should be fine as is without the caching. If you are running into > > performance problems after it''s implemented I could certainly help you > > set up some caching. > > > > Cheers, > > Dave > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
> What do you mean by "heavy"? I guess I''m being a bit defensive > about Java Lucene. I''m not understanding your negatives to Java > Lucene other than your preference for Ruby. It still remains to be > seen how performant and optimized Ferret can be compared to Java > Lucene. My hunch is that porting to C will make it slightly faster > in spots, but whether it is worth the headaches of maintaining the > port is my question.I think I am sounding more negative then I am : ) I repeat I like lucene for most of the project, but for something like a large scale search engine, its maybe a better I think, to have a C implementation. Some project we have used Clucene or lucene4c (I don''t remember, I was projectleader) and it was much faster then using lucene. I was only mentioning making the C port as it maybe faster to implement this.> PyLucene is *fast*. Super fast.Erik, you are the expert, I am just trying to learn as I go along... thnx for your feedback : )
Thanks - all this info is right on. Great! Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:> This is kind of a catch-22. If you can store your index in memory then > it is probably small enough that it won''t need to be stored in memory. > With the C version I''m working on the difference is only about 20%-30% > so not worth worrying about in my opinion.My situation is potentially different. The data I am storing is text-based and somewhat time sensitive. That is, the newest data is what most users will be interested in. However, I need to allow the ability to search for *all results* -- both new data and old. Once the database is large, then this "new data" may be only 1% or less of the overall database. The new data may consist of several thousand documents. I''m wondering if it might be useful to store *all data* in a disk-based index while *also* storing the newest data in an in-memory index. This would allow me to offer faster results when searching only the new data (which is what most people will likely use) while still allowing people to search the entire dataset if they want to. Of course, this is only a good idea if it provides a significantly faster response time for searching the in-memory index. -k
On Dec 13, 2005, at 3:11 PM, Abdur-Rahman Advany wrote:>> What do you mean by "heavy"? I guess I''m being a bit defensive >> about Java Lucene. I''m not understanding your negatives to Java >> Lucene other than your preference for Ruby. It still remains to >> be seen how performant and optimized Ferret can be compared to >> Java Lucene. My hunch is that porting to C will make it slightly >> faster in spots, but whether it is worth the headaches of >> maintaining the port is my question. > > I think I am sounding more negative then I am : ) I repeat I like > lucene for most of the project, but for something like a large > scale search engine, its maybe a better I think, to have a C > implementation. Some project we have used Clucene or lucene4c (I > don''t remember, I was projectleader) and it was much faster then > using lucene. I was only mentioning making the C port as it maybe > faster to implement this.Java Lucene is powering search in some very very heavy duty places, not to mention some top secret ones. For example, Doug is using Nutch (an open source "Google", with Lucene as a core component) to revamp the infrastructure behind The Internet Archive. Yahoo Research Labs and others have funded Doug''s Nutch efforts. I just want to be clear about Java Lucene being as "enterprise" savvy as anyone needs. CLucene was a valiant effort, and supposedly is slightly speedier in some cases, but also not up to date with the latest Java Lucene API. lucene4c hasn''t gotten off the ground. Java Lucene is the most up to date version available and has many features not found in the ports that haven''t kept up. PyLucene just released a version up to date with Java Lucene''s Subversion trunk (mostly by just recompiling, though there were some tweaks to the GCJ/ SWIG pieces apparently as well). All the ports, Ferret included, will always be playing catch-up with Java Lucene. If the maintainers of the ports take a break, they will be behind. I don''t want to discourage folks from porting Lucene at all. But I''m guardedly optimistic about a port being as good as Java Lucene. It truly is one of the few gems in the Java open source world with very little quality competition. Erik
On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote:> Thanks - all this info is right on. Great! > > Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > > This is kind of a catch-22. If you can store your index in memory then > > it is probably small enough that it won''t need to be stored in memory. > > With the C version I''m working on the difference is only about 20%-30% > > so not worth worrying about in my opinion. > > My situation is potentially different. The data I am storing is text-based and > somewhat time sensitive. That is, the newest data is what most users will be > interested in. > > However, I need to allow the ability to search for *all results* -- both new > data and old. Once the database is large, then this "new data" may be only 1% > or less of the overall database. The new data may consist of several thousand > documents.So if I do the math, you''re expecting to have several hundred thousand documents? Ok, you''ve got my attention now.> I''m wondering if it might be useful to store *all data* in a disk-based index > while *also* storing the newest data in an in-memory index. This would allow me > to offer faster results when searching only the new data (which is what most > people will likely use) while still allowing people to search the entire > dataset if they want to.In-memory or not, it will certainly be faster to search a smaller document set so splitting the index in two might not be a bad idea. Perhaps you could have a daily process which reindexes the recent document set.> Of course, this is only a good idea if it provides a significantly faster > response time for searching the in-memory index.The in memory part won''t make the big difference. Having a smaller index might. I''d recommend doing the simplest thing possible and refactoring if necessary. It should''t be hard to add a second in-memory index later. Up to you though. Dave> -k > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
What you could considere is using something like cacheAR for the latest queries or for popular queries.. David Balmain wrote:>On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote: > > >>Thanks - all this info is right on. Great! >> >>Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: >> >> >>>This is kind of a catch-22. If you can store your index in memory then >>>it is probably small enough that it won''t need to be stored in memory. >>>With the C version I''m working on the difference is only about 20%-30% >>>so not worth worrying about in my opinion. >>> >>> >>My situation is potentially different. The data I am storing is text-based and >>somewhat time sensitive. That is, the newest data is what most users will be >>interested in. >> >>However, I need to allow the ability to search for *all results* -- both new >>data and old. Once the database is large, then this "new data" may be only 1% >>or less of the overall database. The new data may consist of several thousand >>documents. >> >> > >So if I do the math, you''re expecting to have several hundred thousand >documents? Ok, you''ve got my attention now. > > > >>I''m wondering if it might be useful to store *all data* in a disk-based index >>while *also* storing the newest data in an in-memory index. This would allow me >>to offer faster results when searching only the new data (which is what most >>people will likely use) while still allowing people to search the entire >>dataset if they want to. >> >> > >In-memory or not, it will certainly be faster to search a smaller >document set so splitting the index in two might not be a bad idea. >Perhaps you could have a daily process which reindexes the recent >document set. > > > >>Of course, this is only a good idea if it provides a significantly faster >>response time for searching the in-memory index. >> >> > >The in memory part won''t make the big difference. Having a smaller >index might. I''d recommend doing the simplest thing possible and >refactoring if necessary. It should''t be hard to add a second >in-memory index later. Up to you though. > >Dave > > > >>-k >>_______________________________________________ >>Rails mailing list >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >>http://lists.rubyonrails.org/mailman/listinfo/rails >> >> >> >_______________________________________________ >Rails mailing list >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org >http://lists.rubyonrails.org/mailman/listinfo/rails > > >
On 12/14/05, Abdur-Rahman Advany <rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:> What you could considere is using something like cacheAR for the latest > queries or for popular queries..I''m not really sure but I think you''d probably just use cacheAR to cache the popular documents. I don''t know if I mentioned already but I haven''t had enough time to work with much of the rails stuff yet. Soon. :-)> David Balmain wrote: > > >On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote: > > > > > >>Thanks - all this info is right on. Great! > >> > >>Quoting David Balmain <dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>: > >> > >> > >>>This is kind of a catch-22. If you can store your index in memory then > >>>it is probably small enough that it won''t need to be stored in memory. > >>>With the C version I''m working on the difference is only about 20%-30% > >>>so not worth worrying about in my opinion. > >>> > >>> > >>My situation is potentially different. The data I am storing is text-based and > >>somewhat time sensitive. That is, the newest data is what most users will be > >>interested in. > >> > >>However, I need to allow the ability to search for *all results* -- both new > >>data and old. Once the database is large, then this "new data" may be only 1% > >>or less of the overall database. The new data may consist of several thousand > >>documents. > >> > >> > > > >So if I do the math, you''re expecting to have several hundred thousand > >documents? Ok, you''ve got my attention now. > > > > > > > >>I''m wondering if it might be useful to store *all data* in a disk-based index > >>while *also* storing the newest data in an in-memory index. This would allow me > >>to offer faster results when searching only the new data (which is what most > >>people will likely use) while still allowing people to search the entire > >>dataset if they want to. > >> > >> > > > >In-memory or not, it will certainly be faster to search a smaller > >document set so splitting the index in two might not be a bad idea. > >Perhaps you could have a daily process which reindexes the recent > >document set. > > > > > > > >>Of course, this is only a good idea if it provides a significantly faster > >>response time for searching the in-memory index. > >> > >> > > > >The in memory part won''t make the big difference. Having a smaller > >index might. I''d recommend doing the simplest thing possible and > >refactoring if necessary. It should''t be hard to add a second > >in-memory index later. Up to you though. > > > >Dave > > > > > > > >>-k > >>_______________________________________________ > >>Rails mailing list > >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >>http://lists.rubyonrails.org/mailman/listinfo/rails > >> > >> > >> > >_______________________________________________ > >Rails mailing list > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > >http://lists.rubyonrails.org/mailman/listinfo/rails > > > > > > > > _______________________________________________ > Rails mailing list > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org > http://lists.rubyonrails.org/mailman/listinfo/rails >