thr3ads.net - Rails - ferret vs. mysql fulltext [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Onur Turgay

2005-Dec-13 11:15 UTC

ferret vs. mysql fulltext

hi,
with current state of ferret, can anyone compare speeds of mysql fulltext
search vs. ferret indexing search. and do I have to query db after taking
results from ferret?
thanks in advance


_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

David Balmain

2005-Dec-13 11:24 UTC

head link

Re: ferret vs. mysql fulltext

Hi Onur,

I can''t offer any input on speed comparisons between Ferret and MySQL
fulltext search. I will say this though. If the results that MySQL
fulltext search returns are good enough then use it. But if you care
about the relevancy of your results and you want to be able to run
advanced queries like boolean queries or phrase queries, you''ll want
to go with Ferret, and it should be fast enough.

As for having to query the database, that will depend how you want to
use Ferret. You can store the data in the Ferret index if you like, in
which case you won''t have to query the database. I think it''s
better
just to keep the data in one spot though.

HTH,
Dave

On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> hi,
> with current state of ferret, can anyone compare speeds of mysql fulltext
> search vs. ferret indexing search. and do I have to query db after taking
> results from ferret?
> thanks in advance
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
>
>

Abdur-Rahman Advany

2005-Dec-13 11:28 UTC

head link

Re: ferret vs. mysql fulltext

Hi,

Ferret is not only faster (as I have benchmarked a few times) as data 
gets larger but its also more accurate because of its query analyser 
(you can use google tike search query''s). There are two options, you
can
store everything in ferret (and not need a database anymore) or store 
only the index (fields you need to index) and retrieve the other
value''s
from mysql.

At this moment I am trying to write a better plugin for ferret so you 
can specify what needs to be index, use the find (instead of an special 
method) with additional options. And automaticly query database for 
additional fields.

Onur Turgay wrote:
> hi,
> with current state of ferret, can anyone compare speeds of mysql 
> fulltext search vs. ferret indexing search. and do I have to query db 
> after taking results from ferret?
> thanks in advance
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>  
>

David Balmain

2005-Dec-13 11:37 UTC

head link

Re: ferret vs. mysql fulltext

On 12/13/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org>
wrote:> Hi,
>
> Ferret is not only faster (as I have benchmarked a few times) as data
> gets larger but its also more accurate because of its query analyser
> (you can use google tike search query''s).
This is great to know. I''m surprised. Ferret is going to by much much
faster soon. I''m rewriting it all in C.
>
> At this moment I am trying to write a better plugin for ferret so you
> can specify what needs to be index, use the find (instead of an special
> method) with additional options. And automaticly query database for
> additional fields.
Please keep us updated as to how this is going. I''d like to add more
stuff like this to the Ferret Wiki. You might like to look at this
page if you haven''t already;

    http://ferret.davebalmain.com/trac/wiki/FerretOnRails

Far from a perfect solution so please feel free to add to it. :-)

Cheers,
Dave
> Onur Turgay wrote:
>
> > hi,
> > with current state of ferret, can anyone compare speeds of mysql
> > fulltext search vs. ferret indexing search. and do I have to query db
> > after taking results from ferret?
> > thanks in advance
> >
>
>------------------------------------------------------------------------
> >
> >_______________________________________________
> >Rails mailing list
> >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> >
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Onur Turgay

2005-Dec-13 11:49 UTC

head link

Re: ferret vs. mysql fulltext

thanks all for the great work.

On 12/13/05, David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:>
> On 12/13/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:
> > Hi,
> >
> > Ferret is not only faster (as I have benchmarked a few times) as data
> > gets larger but its also more accurate because of its query analyser
> > (you can use google tike search query''s).
>
> This is great to know. I''m surprised. Ferret is going to by much
much
> faster soon. I''m rewriting it all in C.
>
> >
> > At this moment I am trying to write a better plugin for ferret so you
> > can specify what needs to be index, use the find (instead of an
special
> > method) with additional options. And automaticly query database for
> > additional fields.
>
> Please keep us updated as to how this is going. I''d like to add
more
> stuff like this to the Ferret Wiki. You might like to look at this
> page if you haven''t already;
>
>     http://ferret.davebalmain.com/trac/wiki/FerretOnRails
>
> Far from a perfect solution so please feel free to add to it. :-)
>
> Cheers,
> Dave
>
> > Onur Turgay wrote:
> >
> > > hi,
> > > with current state of ferret, can anyone compare speeds of mysql
> > > fulltext search vs. ferret indexing search. and do I have to
query db
> > > after taking results from ferret?
> > > thanks in advance
> > >
> >
>
>------------------------------------------------------------------------
> > >
> > >_______________________________________________
> > >Rails mailing list
> > >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> > >http://lists.rubyonrails.org/mailman/listinfo/rails
> > >
> > >
> >
> > _______________________________________________
> > Rails mailing list
> > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> > http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

Abdur-Rahman Advany

2005-Dec-13 12:19 UTC

head link

Re: ferret vs. mysql fulltext

Hi David,

I thinks you should be carefull replacing ''ferret'' as a
database till
its really mature. (Indexes can be recreated anytime with the original 
data). Mysql has proven itself as a mature database sollution and has 
many tools for maintaining and managing. Ferret in my opinion can''t 
replace that (I don''t even think lucene can). It lacks certain 
management tools that are needed for a database, however current 
databases lack advanced query parsers (and thats good because it only 
makes the database complexer). I know about linking lucene to existing 
databases with very good result, this should be possible with ferret or not?

David Balmain wrote:
>Hi Onur,
>
>I can''t offer any input on speed comparisons between Ferret and
MySQL
>fulltext search. I will say this though. If the results that MySQL
>fulltext search returns are good enough then use it. But if you care
>about the relevancy of your results and you want to be able to run
>advanced queries like boolean queries or phrase queries, you''ll
want
>to go with Ferret, and it should be fast enough.
>
>As for having to query the database, that will depend how you want to
>use Ferret. You can store the data in the Ferret index if you like, in
>which case you won''t have to query the database. I think
it''s better
>just to keep the data in one spot though.
>
>HTH,
>Dave
>
>On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>  
>
>>hi,
>>with current state of ferret, can anyone compare speeds of mysql
fulltext
>>search vs. ferret indexing search. and do I have to query db after
taking
>>results from ferret?
>>thanks in advance
>>
>>_______________________________________________
>>Rails mailing list
>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>
>>
>>
>>    
>>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>
>  
>

Onur Turgay

2005-Dec-13 12:22 UTC

head link

Re: ferret vs. mysql fulltext

I think storing data only in ferret is a bad idea as tables have relations
with other tables etc.

On 12/13/05, David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:>
> Hi Onur,
>
> I can''t offer any input on speed comparisons between Ferret and
MySQL
> fulltext search. I will say this though. If the results that MySQL
> fulltext search returns are good enough then use it. But if you care
> about the relevancy of your results and you want to be able to run
> advanced queries like boolean queries or phrase queries, you''ll
want
> to go with Ferret, and it should be fast enough.
>
> As for having to query the database, that will depend how you want to
> use Ferret. You can store the data in the Ferret index if you like, in
> which case you won''t have to query the database. I think
it''s better
> just to keep the data in one spot though.
>
> HTH,
> Dave
>
> On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > hi,
> > with current state of ferret, can anyone compare speeds of mysql
> fulltext
> > search vs. ferret indexing search. and do I have to query db after
> taking
> > results from ferret?
> > thanks in advance
> >
> > _______________________________________________
> > Rails mailing list
> > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> > http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> >
> >
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

_______________________________________________
Rails mailing list
Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
http://lists.rubyonrails.org/mailman/listinfo/rails

David Balmain

2005-Dec-13 13:04 UTC

head link

Re: ferret vs. mysql fulltext

Agreed. I meant it''s probably not worth storing the data in Ferret.
Just use it for the indexing and keep your data in the database.

((On a side note, it is possible for some applications to do away with
the database and use Ferret as the only data store. I think that''s how
Erik Hatcher''s blog software Blogscene works.))

On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
wrote:> I think storing data only in ferret is a bad idea as tables have relations
> with other tables etc.
>
> On 12/13/05, David Balmain <
dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> > Hi Onur,
> >
> > I can''t offer any input on speed comparisons between Ferret
and MySQL
> > fulltext search. I will say this though. If the results that MySQL
> > fulltext search returns are good enough then use it. But if you care
> > about the relevancy of your results and you want to be able to run
> > advanced queries like boolean queries or phrase queries,
you''ll want
> > to go with Ferret, and it should be fast enough.
> >
> > As for having to query the database, that will depend how you want to
> > use Ferret. You can store the data in the Ferret index if you like, in
> > which case you won''t have to query the database. I think
it''s better
> > just to keep the data in one spot though.
> >
> > HTH,
> > Dave
> >
> > On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > hi,
> > > with current state of ferret, can anyone compare speeds of mysql
> fulltext
> > > search vs. ferret indexing search. and do I have to query db
after
> taking
> > > results from ferret?
> > > thanks in advance
> > >
> > > _______________________________________________
> > > Rails mailing list
> > > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> > > http://lists.rubyonrails.org/mailman/listinfo/rails
> > >
> > >
> > >
> > _______________________________________________
> > Rails mailing list
> > Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> > http://lists.rubyonrails.org/mailman/listinfo/rails
> >
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>
>
>

David Balmain

2005-Dec-13 14:15 UTC

head link

Re: ferret vs. mysql fulltext

On 12/13/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org>
wrote:> Hi David,
>
> I thinks you should be carefull replacing ''ferret'' as a
database till
> its really mature. (Indexes can be recreated anytime with the original
> data). Mysql has proven itself as a mature database sollution and has
> many tools for maintaining and managing. Ferret in my opinion
can''t
> replace that (I don''t even think lucene can). It lacks certain
> management tools that are needed for a database, however current
> databases lack advanced query parsers (and thats good because it only
> makes the database complexer). I know about linking lucene to existing
> databases with very good result, this should be possible with ferret or
not?
Sure. I wouldn''t replace a database with Ferret in most instances and
probably not in a Rails app since rails makes it so easy to use a
database. I was just trying to say it was possible to use Ferret or
Lucene as a data store. :-)
> David Balmain wrote:
>
> >Hi Onur,
> >
> >I can''t offer any input on speed comparisons between Ferret
and MySQL
> >fulltext search. I will say this though. If the results that MySQL
> >fulltext search returns are good enough then use it. But if you care
> >about the relevancy of your results and you want to be able to run
> >advanced queries like boolean queries or phrase queries,
you''ll want
> >to go with Ferret, and it should be fast enough.
> >
> >As for having to query the database, that will depend how you want to
> >use Ferret. You can store the data in the Ferret index if you like, in
> >which case you won''t have to query the database. I think
it''s better
> >just to keep the data in one spot though.
> >
> >HTH,
> >Dave
> >
> >On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >
> >>hi,
> >>with current state of ferret, can anyone compare speeds of mysql
fulltext
> >>search vs. ferret indexing search. and do I have to query db after
taking
> >>results from ferret?
> >>thanks in advance
> >>
> >>_______________________________________________
> >>Rails mailing list
> >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >>http://lists.rubyonrails.org/mailman/listinfo/rails
> >>
> >>
> >>
> >>
> >>
> >_______________________________________________
> >Rails mailing list
> >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> >
> >
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Abdur-Rahman Advany

2005-Dec-13 14:30 UTC

head link

Re: ferret vs. mysql fulltext

David,

Are you trying to make a lucene compatible project? or a similar 
project? Because I think with the possibilities of ruby, in time it 
would be possible to go beyond what possible in java..
Really great project, I hope to be able to contribute, my C skill are a 
little old (10 years orso) maybe I can help you out on the ruby end for 
improvements...

David Balmain wrote:
>Agreed. I meant it''s probably not worth storing the data in Ferret.
>Just use it for the indexing and keep your data in the database.
>
>((On a side note, it is possible for some applications to do away with
>the database and use Ferret as the only data store. I think that''s
how
>Erik Hatcher''s blog software Blogscene works.))
>
>On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>  
>
>>I think storing data only in ferret is a bad idea as tables have
relations
>>with other tables etc.
>>
>>On 12/13/05, David Balmain <
dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>    
>>
>>>Hi Onur,
>>>
>>>I can''t offer any input on speed comparisons between Ferret
and MySQL
>>>fulltext search. I will say this though. If the results that MySQL
>>>fulltext search returns are good enough then use it. But if you care
>>>about the relevancy of your results and you want to be able to run
>>>advanced queries like boolean queries or phrase queries,
you''ll want
>>>to go with Ferret, and it should be fast enough.
>>>
>>>As for having to query the database, that will depend how you want
to
>>>use Ferret. You can store the data in the Ferret index if you like,
in
>>>which case you won''t have to query the database. I think
it''s better
>>>just to keep the data in one spot though.
>>>
>>>HTH,
>>>Dave
>>>
>>>On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>      
>>>
>>>>hi,
>>>>with current state of ferret, can anyone compare speeds of mysql
>>>>        
>>>>
>>fulltext
>>    
>>
>>>>search vs. ferret indexing search. and do I have to query db
after
>>>>        
>>>>
>>taking
>>    
>>
>>>>results from ferret?
>>>>thanks in advance
>>>>
>>>>_______________________________________________
>>>>Rails mailing list
>>>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Rails mailing list
>>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>>
>>>      
>>>
>>_______________________________________________
>>Rails mailing list
>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>
>>
>>
>>    
>>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>
>  
>

David Balmain

2005-Dec-13 15:15 UTC

head link

Re: ferret vs. mysql fulltext

On 12/13/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org>
wrote:> David,
>
> Are you trying to make a lucene compatible project? or a similar
> project? Because I think with the possibilities of ruby, in time it
> would be possible to go beyond what possible in java..
Very good question. At the moment I''m trying to stay compatible. But
if I get enough contributers I''ll consider forking off. Lucene is
quite a large project with a lot of contributers so it might be hard
to push ahead of them.
> Really great project, I hope to be able to contribute, my C skill are a
> little old (10 years orso) maybe I can help you out on the ruby end for
> improvements...
Any help is appreciated. Just recommending Ferret is going to help the
project in the long run so I thank you for that. Also contributing to
the wiki is very important.

Thanks,
Dave
> David Balmain wrote:
>
> >Agreed. I meant it''s probably not worth storing the data in
Ferret.
> >Just use it for the indexing and keep your data in the database.
> >
> >((On a side note, it is possible for some applications to do away with
> >the database and use Ferret as the only data store. I think
that''s how
> >Erik Hatcher''s blog software Blogscene works.))
> >
> >On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >
> >
> >>I think storing data only in ferret is a bad idea as tables have
relations
> >>with other tables etc.
> >>
> >>On 12/13/05, David Balmain <
dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>
> >>
> >>>Hi Onur,
> >>>
> >>>I can''t offer any input on speed comparisons between
Ferret and MySQL
> >>>fulltext search. I will say this though. If the results that
MySQL
> >>>fulltext search returns are good enough then use it. But if you
care
> >>>about the relevancy of your results and you want to be able to
run
> >>>advanced queries like boolean queries or phrase queries,
you''ll want
> >>>to go with Ferret, and it should be fast enough.
> >>>
> >>>As for having to query the database, that will depend how you
want to
> >>>use Ferret. You can store the data in the Ferret index if you
like, in
> >>>which case you won''t have to query the database. I
think it''s better
> >>>just to keep the data in one spot though.
> >>>
> >>>HTH,
> >>>Dave
> >>>
> >>>On 12/13/05, Onur Turgay
<onurturgay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >>>
> >>>
> >>>>hi,
> >>>>with current state of ferret, can anyone compare speeds of
mysql
> >>>>
> >>>>
> >>fulltext
> >>
> >>
> >>>>search vs. ferret indexing search. and do I have to query
db after
> >>>>
> >>>>
> >>taking
> >>
> >>
> >>>>results from ferret?
> >>>>thanks in advance
> >>>>
> >>>>_______________________________________________
> >>>>Rails mailing list
> >>>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >>>>http://lists.rubyonrails.org/mailman/listinfo/rails
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Rails mailing list
> >>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >>>http://lists.rubyonrails.org/mailman/listinfo/rails
> >>>
> >>>
> >>>
> >>_______________________________________________
> >>Rails mailing list
> >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >>http://lists.rubyonrails.org/mailman/listinfo/rails
> >>
> >>
> >>
> >>
> >>
> >_______________________________________________
> >Rails mailing list
> >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> >
> >
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Erik Hatcher

2005-Dec-13 15:31 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 8:04 AM, David Balmain wrote:> ((On a side note, it is possible for some applications to do away with
> the database and use Ferret as the only data store. I think that''s
how
> Erik Hatcher''s blog software Blogscene works.))
If only I had that e-mail-to-blog gateway, I''d be blogging all the
time!

Yes, http://www.blogscene.org/erik is powered entirely by a Lucene  
index, a servlet, and some Velocity templates.  The original blog  
entries reside in blosxom-style text files, but at runtime only  
Lucene is used.

It really depends on the scenario, but in general I don''t recommend  
using Lucene (or Ferret) as the definitive data source.  The primary  
reason is that an index is optimized for how it is going to be  
searched, and you may later want to change how text is tokenized and  
thus what terms are indexed.  Having the original data around to be  
able to re-index with different settings is a good thing.  It''s also  
possible to store the original data in Lucene and pull it out for  
reindexing purposes - but that is trickier.

	Erik

Erik Hatcher

2005-Dec-13 15:41 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 9:30 AM, Abdur-Rahman Advany wrote:> Are you trying to make a lucene compatible project? or a similar  
> project? Because I think with the possibilities of ruby, in time it  
> would be possible to go beyond what possible in java..
Could you elaborate in what ways you feel Ferret could go beyond what  
is possible with Java Lucene?  How does Java hold Lucene back?

Genuinely curious,
	Erik

Abdur-Rahman Advany

2005-Dec-13 16:28 UTC

head link

Re: ferret vs. mysql fulltext

Erik,

I am sorry, I just exited about ruby in general. But I thing with 
language like ruby and a project like lucene, it’s my personal opinion 
that LOC makes a difference. Things like mixins and the way ruby you 
program in ruby makes things just a bit easier. I took me 4/5 days to 
understand and work with lucene (great book b.t.w.) and it only took me 
a 10 days to learn most of edge rails and many other plugins by reading 
code (yes not docs, code LOL)...

Lucene is a great product, and will continue on java (you can''t kill 
java, its really usable for many things). But ruby just makes it easy to 
program, and with the integration with c. Well things are optimized. I 
have only been rubying for a day or 20. But it amazes my howmuch a 
language can make a difference...

So I have to revise my statement a bit, but I think, in time, melting 
Ferret and ActiveRecord together could make it a better product then 
lucene : ) But that future talk...

Well, I am amazed to see you here : ) what is your opinion?

Abdur-Rahman

Erik Hatcher wrote:
>
> On Dec 13, 2005, at 9:30 AM, Abdur-Rahman Advany wrote:
>
>> Are you trying to make a lucene compatible project? or a similar 
>> project? Because I think with the possibilities of ruby, in time it 
>> would be possible to go beyond what possible in java..
>
>
> Could you elaborate in what ways you feel Ferret could go beyond what 
> is possible with Java Lucene? How does Java hold Lucene back?
>
> Genuinely curious,
> Erik
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Erik Hatcher

2005-Dec-13 17:27 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 11:28 AM, Abdur-Rahman Advany wrote:> I am sorry, I just exited about ruby in general. But I thing with  
> language like ruby and a project like lucene, it’s my personal  
> opinion that LOC makes a difference. Things like mixins and the way  
> ruby you program in ruby makes things just a bit easier. I took me  
> 4/5 days to understand and work with lucene (great book b.t.w.) and  
> it only took me a 10 days to learn most of edge rails and many  
> other plugins by reading code (yes not docs, code LOL)...
It''s not quite comparable the difference between a full-text search  
engine and a web framework.

Lucene is optimized heavily - it''s code is more C-like than Java- 
like.  Making Lucene more OO or taking advantage of all the fancy  
Ruby ways of method trickery is likely to slow things down.  The  
entire idea of a full-text search engine is to be fast!  (oh, and to  
be easy on resources as well)
> Lucene is a great product, and will continue on java (you can''t  
> kill java, its really usable for many things). But ruby just makes  
> it easy to program, and with the integration with c. Well things  
> are optimized. I have only been rubying for a day or 20. But it  
> amazes my howmuch a language can make a difference...
The folks that would be coding under the covers of Ferret or Lucene  
are a highly specialized group of folks.  Likewise with the core code  
of Rails.  Most users don''t need to see what is underneath - it just  
works.

Indeed the language makes a difference, but also the goal of the  
effort.  A full-text search engine has some very specialized needs  
and even the most basic data structures in high level languages like  
Hash and Array are only used if they are fast enough, otherwise  
alternatives are created.  This is definitely the case with Lucene.
> So I have to revise my statement a bit, but I think, in time,  
> melting Ferret and ActiveRecord together could make it a better  
> product then lucene : ) But that future talk...
Well, in all fairness to Lucene, it is orthogonal to the database  
concern entirely.  Of course Ferret + ActiveRecord > just Lucene, but  
to make the comparison more fair, how about Lucene + Hibernate?   
There are hooks for Hibernate to index with Lucene, even using Java  
annotations to mark the fields to be indexed, and how they are to be  
indexed.  I see ActiveRecord + Ferret to be a great path to go, and  
the acts_as_ferret initial implementation is on the right track.  I  
hope to delve into this area more myself in the future (though my  
work does not currently involve relational databases, but will soon).
> Well, I am amazed to see you here : ) what is your opinion?
I''ve been a Ruby fan for ages, ever since catching a Dave Thomas  
presentation in ''02.  I''ve dreamed of RubyLucene for years,
creating
the rubylucene (formerly rucene) project at RubyForge once upon a  
time but not doing much with it beyond some low-level I/O proof of  
concept tests.

I''m ecstatic that Ferret exists!   I do have some reservations on the  
effort to port it all to C, as I''d really like the effort to aim  
towards the architecture PyLucene has, where it uses GCJ against Java  
Lucene, and then wraps it, using SWIG, into a Pythonic API.  In order  
to avoid porting every time Java Lucene changes (which is where the  
guru creator Doug Cutting spends his effort), it would be a simple  
recompilation (and perhaps some API glue).

	Erik

Kevin Bedell

2005-Dec-13 17:38 UTC

head link

Ferret on rails question

I just got done reviewing some of the info in the ferret wiki. It looks like
some great work - thanks!

I''m building an app that is going have some search capability and I was
planning
on using mysql with fulltext searches, but looking at ferret has got me
wondering if there might not be a better way.

Specifically, I was wondering about the idea of using an in memory index for
increasing the speed of searches.

The data i''m storing will be most utilized when it is relatively new.
After it''s
a few days old, people won''t need it as much. So putting all this data
in the
same database may not make sense (if it''s relatively easy to split it
into
''fresh'' and ''stale'' databases).

Would it make sense to consider using an in-memory cache of documents for the
newest data while having a disk-based index for when people want to search for
older documents? Or would the performance gains not be worth the effort?

-kevin

Kevin Bedell

2005-Dec-13 17:53 UTC

head link

Ferret on rails question pt 2

I just wanted to add that I think the ideal solution would be for me to be able
to define a single index that did both -- that is, that would cache documents
in memory while keeping full index in disk.

It would be great as well if I could specify how I wanted the cache to work --
say, by giving it a regular expression or some query to tell it what should be
cached in memory. Maybe I could also specify a limit on the total memory it
should use for cache.

I might, for example, want to have it cache documents based on a certain user or
customer id rather than cache them by date. Maybe whenever a new user logs in I
modify the cache settings to include their documents in the cache -- and
whenever someone logs out I flush theirs.

The value of this is that it hides the complexity from developers/users and
makes it easy to use.

Sorry for the ''stream of consciousness'' design reqs --
I''m just dumping the idea
now since I was thinking about it...

Abdur-Rahman Advany

2005-Dec-13 17:54 UTC

head link

Re: ferret vs. mysql fulltext

Erik Hatcher wrote:
> It''s not quite comparable the difference between a full-text
search
> engine and a web framework.
>
> Lucene is optimized heavily - it''s code is more C-like than Java- 
> like.  Making Lucene more OO or taking advantage of all the fancy  
> Ruby ways of method trickery is likely to slow things down.  The  
> entire idea of a full-text search engine is to be fast!  (oh, and to  
> be easy on resources as well)
The java version is really heavy a.t.m. (just to mention it ;)), but 
your quite right, search querie''s can''t be cached very easily.
So
writing optimized code is very important.
> Well, in all fairness to Lucene, it is orthogonal to the database  
> concern entirely.  Of course Ferret + ActiveRecord > just Lucene, but  
> to make the comparison more fair, how about Lucene + Hibernate?   
> There are hooks for Hibernate to index with Lucene, even using Java  
> annotations to mark the fields to be indexed, and how they are to be  
> indexed.  I see ActiveRecord + Ferret to be a great path to go, and  
> the acts_as_ferret initial implementation is on the right track.  I  
> hope to delve into this area more myself in the future (though my  
> work does not currently involve relational databases, but will soon).
I am busy at the moment to create a plugin for rails, but ill be easy to 
use to extend ActiveRecord. I am trying combine the database and Ferret 
with a news methods that builds upon find (search), just ferret if a 
query is present and fetch the rows using find.
> I''ve been a Ruby fan for ages, ever since catching a Dave Thomas  
> presentation in ''02.  I''ve dreamed of RubyLucene for
years, creating
> the rubylucene (formerly rucene) project at RubyForge once upon a  
> time but not doing much with it beyond some low-level I/O proof of  
> concept tests.
>
> I''m ecstatic that Ferret exists!   I do have some reservations on
the
> effort to port it all to C, as I''d really like the effort to aim  
> towards the architecture PyLucene has, where it uses GCJ against Java  
> Lucene, and then wraps it, using SWIG, into a Pythonic API.  In order  
> to avoid porting every time Java Lucene changes (which is where the  
> guru creator Doug Cutting spends his effort), it would be a simple  
> recompilation (and perhaps some API glue).
Thats a very good idea, but compiling java sound weird :). David have 
you considered this? I wonder how will it would integrate..

Jeremy Kemper

2005-Dec-13 17:57 UTC

head link

Re: ferret vs. mysql fulltext

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Dec 13, 2005, at 6:15 AM, David Balmain wrote:> On 12/13/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org> wrote:
>> I thinks you should be carefull replacing ''ferret'' as
a database till
>> its really mature. (Indexes can be recreated anytime with the  
>> original
>> data). Mysql has proven itself as a mature database sollution and has
>> many tools for maintaining and managing. Ferret in my opinion
can''t
>> replace that (I don''t even think lucene can). It lacks certain
>> management tools that are needed for a database, however current
>> databases lack advanced query parsers (and thats good because it only
>> makes the database complexer). I know about linking lucene to  
>> existing
>> databases with very good result, this should be possible with  
>> ferret or not?
>
> Sure. I wouldn''t replace a database with Ferret in most instances
and
> probably not in a Rails app since rails makes it so easy to use a
> database. I was just trying to say it was possible to use Ferret or
> Lucene as a data store. :-)
I treat the data I store in the Ferret index as a denormalized table
tuned for the queries it answers.

jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)

iD8DBQFDnwuCAQHALep9HFYRAvqDAJ9q3QwWgxpjke4XMrxW4tZh4vbsgACfb48b
odJNj9m2MkZgyg180o/s9z8=O3sr
-----END PGP SIGNATURE-----

David Balmain

2005-Dec-13 18:35 UTC

head link

Re: ferret vs. mysql fulltext

On 12/14/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org>
wrote:> Erik Hatcher wrote:
>
> > It''s not quite comparable the difference between a full-text
search
> > engine and a web framework.
> >
> > Lucene is optimized heavily - it''s code is more C-like than
Java-
> > like.  Making Lucene more OO or taking advantage of all the fancy
> > Ruby ways of method trickery is likely to slow things down.  The
> > entire idea of a full-text search engine is to be fast!  (oh, and to
> > be easy on resources as well)
>
> The java version is really heavy a.t.m. (just to mention it ;)), but
> your quite right, search querie''s can''t be cached very
easily. So
> writing optimized code is very important.
>
> > Well, in all fairness to Lucene, it is orthogonal to the database
> > concern entirely.  Of course Ferret + ActiveRecord > just Lucene,
but
> > to make the comparison more fair, how about Lucene + Hibernate?
> > There are hooks for Hibernate to index with Lucene, even using Java
> > annotations to mark the fields to be indexed, and how they are to be
> > indexed.  I see ActiveRecord + Ferret to be a great path to go, and
> > the acts_as_ferret initial implementation is on the right track.  I
> > hope to delve into this area more myself in the future (though my
> > work does not currently involve relational databases, but will soon).
>
> I am busy at the moment to create a plugin for rails, but ill be easy to
> use to extend ActiveRecord. I am trying combine the database and Ferret
> with a news methods that builds upon find (search), just ferret if a
> query is present and fetch the rows using find.
>
> > I''ve been a Ruby fan for ages, ever since catching a Dave
Thomas
> > presentation in ''02.  I''ve dreamed of RubyLucene for
years, creating
> > the rubylucene (formerly rucene) project at RubyForge once upon a
> > time but not doing much with it beyond some low-level I/O proof of
> > concept tests.
> >
> > I''m ecstatic that Ferret exists!   I do have some
reservations on the
> > effort to port it all to C, as I''d really like the effort to
aim
> > towards the architecture PyLucene has, where it uses GCJ against Java
> > Lucene, and then wraps it, using SWIG, into a Pythonic API.  In order
> > to avoid porting every time Java Lucene changes (which is where the
> > guru creator Doug Cutting spends his effort), it would be a simple
> > recompilation (and perhaps some API glue).
>
> Thats a very good idea, but compiling java sound weird :). David have
> you considered this? I wonder how will it would integrate..
Yes, Erik and I have discussed it already. It might be a better way to
do it but I can''t find the motivation. It''s a lot more
interesting and
motivating for me trying to create something that runs faster than
Lucene. Besides being slightly faster, C is also lighter on resources
and makes for a much smaller download. I was and still am interested
in desktop search so these are all important to me. Speaking of Doug
Cutting, he has some words to say on this too;

http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html

So those are my reasons with taking the route I am, and since I''m
currently doing the work, I get to choose. ;-)  If anyone wants to get
stuck into porting the PyLucene stuff I''m more than willing to lend
and hand. It''s definitely worth doing but it''s not really my
cup of
tea.


> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

David Balmain

2005-Dec-13 18:52 UTC

head link

Re: Ferret on rails question pt 2

Hi Kevin,

I can''t quite tell from your description. Do you actually want to
store and retrieve the documents from a Ferret index? Or do you just
want to run the search on the index and then retrieve the results from
the database? Also, how large a document set are you expecting? If you
still have to retrieve the documents from the database I think Ferret
should be fine as is without the caching. If you are running into
performance problems after it''s implemented I could certainly help you
set up some caching.

Cheers,
Dave

On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org>
wrote:> I just wanted to add that I think the ideal solution would be for me to be
able
> to define a single index that did both -- that is, that would cache
documents
> in memory while keeping full index in disk.
>
> It would be great as well if I could specify how I wanted the cache to work
--
> say, by giving it a regular expression or some query to tell it what should
be
> cached in memory. Maybe I could also specify a limit on the total memory it
> should use for cache.
>
> I might, for example, want to have it cache documents based on a certain
user or
> customer id rather than cache them by date. Maybe whenever a new user logs
in I
> modify the cache settings to include their documents in the cache -- and
> whenever someone logs out I flush theirs.
>
> The value of this is that it hides the complexity from developers/users and
> makes it easy to use.
>
> Sorry for the ''stream of consciousness'' design reqs --
I''m just dumping the idea
> now since I was thinking about it...
>
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Abdur-Rahman Advany

2005-Dec-13 18:58 UTC

head link

Re: ferret vs. mysql fulltext

>
>
>Yes, Erik and I have discussed it already. It might be a better way to
>do it but I can''t find the motivation. It''s a lot more
interesting and
>motivating for me trying to create something that runs faster than
>Lucene. Besides being slightly faster, C is also lighter on resources
>and makes for a much smaller download. I was and still am interested
>in desktop search so these are all important to me. Speaking of Doug
>Cutting, he has some words to say on this too;
>
>http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html
>
>So those are my reasons with taking the route I am, and since I''m
>currently doing the work, I get to choose. ;-)  If anyone wants to get
>stuck into porting the PyLucene stuff I''m more than willing to lend
>and hand. It''s definitely worth doing but it''s not really
my cup of
>tea.
>haha : ) wel, your doing a great job, ill continue to use ferret! I 
don''t have the client request a.t.m. for taking on such a project.
Maybe
in after a couple of months...

Jan Prill

2005-Dec-13 18:58 UTC

head link

Re: ferret vs. mysql fulltext

>Yes, Erik and I have discussed it already. It might be a better way to
>do it but I can''t find the motivation. It''s a lot more
interesting and
>motivating for me trying to create something that runs faster than
>Lucene. Besides being slightly faster, C is also lighter on resources
>and makes for a much smaller download. I was and still am interested
>in desktop search so these are all important to me. Speaking of Doug
>Cutting, he has some words to say on this too;
>
>http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html
>
>So those are my reasons with taking the route I am, and since I''m
>currently doing the work, I get to choose. ;-)  If anyone wants to get
>stuck into porting the PyLucene stuff I''m more than willing to lend
>and hand. It''s definitely worth doing but it''s not really
my cup of
>tea.
>
>  
>My kudos for these honest words!! A motivated developer is often the 
most important thing.

Even in this early stage the rails community owes a great deal of 
compliment to the ongoing efforts on ferret.

regards
Jan

David Balmain

2005-Dec-13 19:01 UTC

head link

Re: ferret vs. mysql fulltext

On 12/14/05, Jan Prill <JanPrill-sTn/vYlS8ieELgA04lAiVw@public.gmane.org>
wrote:>
> >Yes, Erik and I have discussed it already. It might be a better way to
> >do it but I can''t find the motivation. It''s a lot
more interesting and
> >motivating for me trying to create something that runs faster than
> >Lucene. Besides being slightly faster, C is also lighter on resources
> >and makes for a much smaller download. I was and still am interested
> >in desktop search so these are all important to me. Speaking of Doug
> >Cutting, he has some words to say on this too;
> >
>
>http://nutch.sourceforge.net/blog/2005/02/open-source-desktop-search.html
> >
> >So those are my reasons with taking the route I am, and since
I''m
> >currently doing the work, I get to choose. ;-)  If anyone wants to get
> >stuck into porting the PyLucene stuff I''m more than willing to
lend
> >and hand. It''s definitely worth doing but it''s not
really my cup of
> >tea.
> >
> >
> >
> My kudos for these honest words!! A motivated developer is often the
> most important thing.
>
> Even in this early stage the rails community owes a great deal of
> compliment to the ongoing efforts on ferret.
Especially the logos. ;-)
Thanks.
> regards
> Jan
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Kevin Bedell

2005-Dec-13 19:12 UTC

head link

Re: Ferret on rails question pt 2

I''m not sure yet what''s best. I haven''t built that
part of my app yet and am
still working through the design. I''m just trying to think through the
best
approach for now. Do you have pointers to docs that can provide some basic
''rules of thumb'' for design - like when to store docs in a
database and run a
search on the index -v- when to store docs in the index directly?

I used Verity for search on an e-commerce site I helped build a few years ago.
We stored the actual docs in a database (product descriptions, actually) but
used verity for searching - it worked fine, but was a pain since updating the
product catalog tables and the verity search index had to be closely
coordinated or you''d find search results for products that
weren''t in the
database...

Also, regarding creating an index in memory -v- creating it on disk -- are there
significant performance differences (eg, 20% - 50% faster or more) when using an
in-memory index? Has anyone published test results?

Thanks again for your help and your efforts. My needs aren''t pressing,
I''m just
trying to figure out using ferret might benefit the app I''m building.

-k


Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> Hi Kevin,
>
> I can''t quite tell from your description. Do you actually want to
> store and retrieve the documents from a Ferret index? Or do you just
> want to run the search on the index and then retrieve the results from
> the database? Also, how large a document set are you expecting? If you
> still have to retrieve the documents from the database I think Ferret
> should be fine as is without the caching. If you are running into
> performance problems after it''s implemented I could certainly help
you
> set up some caching.
>
> Cheers,
> Dave
>

Erik Hatcher

2005-Dec-13 19:42 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 12:54 PM, Abdur-Rahman Advany wrote:> Erik Hatcher wrote:
>
>> It''s not quite comparable the difference between a full-text  
>> search  engine and a web framework.
>>
>> Lucene is optimized heavily - it''s code is more C-like than
Java-
>> like.  Making Lucene more OO or taking advantage of all the fancy   
>> Ruby ways of method trickery is likely to slow things down.  The   
>> entire idea of a full-text search engine is to be fast!  (oh, and  
>> to  be easy on resources as well)
>
> The java version is really heavy a.t.m. (just to mention it ;)),  
> but your quite right, search querie''s can''t be cached
very easily.
> So writing optimized code is very important.
What do you mean by "heavy"?   I guess I''m being a bit
defensive
about Java Lucene.  I''m not understanding your negatives to Java  
Lucene other than your preference for Ruby.  It still remains to be  
seen how performant and optimized Ferret can be compared to Java  
Lucene.  My hunch is that porting to C will make it slightly faster  
in spots, but whether it is worth the headaches of maintaining the  
port is my question.
>> I''ve been a Ruby fan for ages, ever since catching a Dave
Thomas
>> presentation in ''02.  I''ve dreamed of RubyLucene for
years,
>> creating  the rubylucene (formerly rucene) project at RubyForge  
>> once upon a  time but not doing much with it beyond some low-level  
>> I/O proof of  concept tests.
>>
>> I''m ecstatic that Ferret exists!   I do have some reservations
on
>> the  effort to port it all to C, as I''d really like the effort
to
>> aim  towards the architecture PyLucene has, where it uses GCJ  
>> against Java  Lucene, and then wraps it, using SWIG, into a  
>> Pythonic API.  In order  to avoid porting every time Java Lucene  
>> changes (which is where the  guru creator Doug Cutting spends his  
>> effort), it would be a simple  recompilation (and perhaps some API  
>> glue).
>
> Thats a very good idea, but compiling java sound weird :). David  
> have you considered this? I wonder how will it would integrate..
PyLucene is *fast*.  Super fast.

	Erik

Erik Hatcher

2005-Dec-13 19:46 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 1:58 PM, Jan Prill wrote:>> So those are my reasons with taking the route I am, and since
I''m
>> currently doing the work, I get to choose. ;-)  If anyone wants to  
>> get
>> stuck into porting the PyLucene stuff I''m more than willing to
lend
>> and hand. It''s definitely worth doing but it''s not
really my cup of
>> tea.
>>
>>
> My kudos for these honest words!! A motivated developer is often  
> the most important thing.
>
> Even in this early stage the rails community owes a great deal of  
> compliment to the ongoing efforts on ferret.
Hear hear!   Kudos to Dave for Ferret and I fully encourage him to  
choose the development path he wants to go on.  I hope he succeeds in  
making a faster Lucene, for sure, regardless of what language he  
creates it for.

	Erik

David Balmain

2005-Dec-13 19:48 UTC

head link

Re: Ferret on rails question pt 2

On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org>
wrote:> I''m not sure yet what''s best. I haven''t built
that part of my app yet and am
> still working through the design. I''m just trying to think through
the best
> approach for now. Do you have pointers to docs that can provide some basic
> ''rules of thumb'' for design - like when to store docs in
a database and run a
> search on the index -v- when to store docs in the index directly?
I don''t know if you caught the other thread on Ferret but as we were
discussing, it''s usually better to store the documents in the database
and use ferret for finding the relevent documents. In rails, the way
to go is probably use something like this;

http://ferret.davebalmain.com/trac/wiki/FerretOnRails

The main reason you''d store stuff in the index is to allow result
searching. For example, if you wanted to sort your search results by
create_date then you''d need to store create_date in the index. There
are a few other times I can think of that you might want to store
documents in an index but they don''t apply to a rails app.
> I used Verity for search on an e-commerce site I helped build a few years
ago.
> We stored the actual docs in a database (product descriptions, actually)
but
> used verity for searching - it worked fine, but was a pain since updating
the
> product catalog tables and the verity search index had to be closely
> coordinated or you''d find search results for products that
weren''t in the
> database...
You need to be careful of this with Ferret too. This is the problem
the acts_as_ferret ActiveRecord hocks are trying to solve. It still
requires a bit of work. I haven''t played with rails for a while now
but when I get the chance I''ll try and come up with something better.
> Also, regarding creating an index in memory -v- creating it on disk -- are
there
> significant performance differences (eg, 20% - 50% faster or more) when
using an
> in-memory index? Has anyone published test results?
>
> Thanks again for your help and your efforts. My needs aren''t
pressing, I''m just
> trying to figure out using ferret might benefit the app I''m
building.
This is kind of a catch-22. If you can store your index in memory then
it is probably small enough that it won''t need to be stored in memory.
With the C version I''m working on the difference is only about 20%-30%
so not worth worrying about in my opinion.

HTH,
Dave
> -k
>
>
> Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>
> > Hi Kevin,
> >
> > I can''t quite tell from your description. Do you actually
want to
> > store and retrieve the documents from a Ferret index? Or do you just
> > want to run the search on the index and then retrieve the results from
> > the database? Also, how large a document set are you expecting? If you
> > still have to retrieve the documents from the database I think Ferret
> > should be fine as is without the caching. If you are running into
> > performance problems after it''s implemented I could certainly
help you
> > set up some caching.
> >
> > Cheers,
> > Dave
> >
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Abdur-Rahman Advany

2005-Dec-13 20:11 UTC

head link

Re: ferret vs. mysql fulltext

> What do you mean by "heavy"?   I guess I''m being a bit
defensive
> about Java Lucene.  I''m not understanding your negatives to Java  
> Lucene other than your preference for Ruby.  It still remains to be  
> seen how performant and optimized Ferret can be compared to Java  
> Lucene.  My hunch is that porting to C will make it slightly faster  
> in spots, but whether it is worth the headaches of maintaining the  
> port is my question.
I think I am sounding more negative then I am : ) I repeat I like lucene 
for most of the project, but for something like a large scale search 
engine, its maybe a better I think, to have a C implementation. Some 
project we have used Clucene or lucene4c (I don''t remember, I was 
projectleader) and it was much faster then using lucene. I was only 
mentioning making the C port as it maybe faster to implement this.
> PyLucene is *fast*.  Super fast.
Erik, you are the expert, I am just trying to learn as I go along... 
thnx for your feedback : )

Kevin Bedell

2005-Dec-13 20:21 UTC

head link

Re: Ferret on rails question pt 2

Thanks - all this info is right on. Great!

Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:> This is kind of a catch-22. If you can store your index in memory then
> it is probably small enough that it won''t need to be stored in
memory.
> With the C version I''m working on the difference is only about
20%-30%
> so not worth worrying about in my opinion.
My situation is potentially different. The data I am storing is text-based and
somewhat time sensitive. That is, the newest data is what most users will be
interested in.

However, I need to allow the ability to search for *all results* -- both new
data and old. Once the database is large, then this "new data" may be
only 1%
or less of the overall database. The new data may consist of several thousand
documents.

I''m wondering if it might be useful to store *all data* in a disk-based
index
while *also* storing the newest data in an in-memory index. This would allow me
to offer faster results when searching only the new data (which is what most
people will likely use) while still allowing people to search the entire
dataset if they want to.

Of course, this is only a good idea if it provides a significantly faster
response time for searching the in-memory index.

-k

Erik Hatcher

2005-Dec-13 20:27 UTC

head link

Re: ferret vs. mysql fulltext

On Dec 13, 2005, at 3:11 PM, Abdur-Rahman Advany wrote:>> What do you mean by "heavy"?   I guess I''m being a
bit defensive
>> about Java Lucene.  I''m not understanding your negatives to
Java
>> Lucene other than your preference for Ruby.  It still remains to  
>> be  seen how performant and optimized Ferret can be compared to  
>> Java  Lucene.  My hunch is that porting to C will make it slightly  
>> faster  in spots, but whether it is worth the headaches of  
>> maintaining the  port is my question.
>
> I think I am sounding more negative then I am : ) I repeat I like  
> lucene for most of the project, but for something like a large  
> scale search engine, its maybe a better I think, to have a C  
> implementation. Some project we have used Clucene or lucene4c (I  
> don''t remember, I was projectleader) and it was much faster then  
> using lucene. I was only mentioning making the C port as it maybe  
> faster to implement this.
Java Lucene is powering search in some very very heavy duty places,  
not to mention some top secret ones.

For example, Doug is using Nutch (an open source "Google", with  
Lucene as a core component) to revamp the infrastructure behind The  
Internet Archive.  Yahoo Research Labs and others have funded Doug''s  
Nutch efforts.  I just want to be clear about Java Lucene being as  
"enterprise" savvy as anyone needs.  CLucene was a valiant effort,  
and supposedly is slightly speedier in some cases, but also not up to  
date with the latest Java Lucene API.  lucene4c hasn''t gotten off the  
ground.

Java Lucene is the most up to date version available and has many  
features not found in the ports that haven''t kept up.  PyLucene just  
released a version up to date with Java Lucene''s Subversion trunk  
(mostly by just recompiling, though there were some tweaks to the GCJ/ 
SWIG pieces apparently as well).  All the ports, Ferret included,  
will always be playing catch-up with Java Lucene.  If the maintainers  
of the ports take a break, they will be behind.

I don''t want to discourage folks from porting Lucene at all.  But
I''m
guardedly optimistic about a port being as good as Java Lucene.  It  
truly is one of the few gems in the Java open source world with very  
little quality competition.

	Erik

David Balmain

2005-Dec-13 20:35 UTC

head link

Re: Ferret on rails question pt 2

On 12/14/05, Kevin Bedell <kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org>
wrote:> Thanks - all this info is right on. Great!
>
> Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> > This is kind of a catch-22. If you can store your index in memory then
> > it is probably small enough that it won''t need to be stored
in memory.
> > With the C version I''m working on the difference is only
about 20%-30%
> > so not worth worrying about in my opinion.
>
> My situation is potentially different. The data I am storing is text-based
and
> somewhat time sensitive. That is, the newest data is what most users will
be
> interested in.
>
> However, I need to allow the ability to search for *all results* -- both
new
> data and old. Once the database is large, then this "new data"
may be only 1%
> or less of the overall database. The new data may consist of several
thousand
> documents.
So if I do the math, you''re expecting to have several hundred thousand
documents? Ok, you''ve got my attention now.
> I''m wondering if it might be useful to store *all data* in a
disk-based index
> while *also* storing the newest data in an in-memory index. This would
allow me
> to offer faster results when searching only the new data (which is what
most
> people will likely use) while still allowing people to search the entire
> dataset if they want to.
In-memory or not, it will certainly be faster to search a smaller
document set so splitting the index in two might not be a bad idea.
Perhaps you could have a daily process which reindexes the recent
document set.
> Of course, this is only a good idea if it provides a significantly faster
> response time for searching the in-memory index.
The in memory part won''t make the big difference. Having a smaller
index might. I''d recommend doing the simplest thing possible and
refactoring if necessary. It should''t be hard to add a second
in-memory index later. Up to you though.

Dave
> -k
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Abdur-Rahman Advany

2005-Dec-13 23:52 UTC

head link

Re: Ferret on rails question pt 2

What you could considere is using something like cacheAR for the latest 
queries or for popular queries..

David Balmain wrote:
>On 12/14/05, Kevin Bedell
<kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote:
>  
>
>>Thanks - all this info is right on. Great!
>>
>>Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
>>    
>>
>>>This is kind of a catch-22. If you can store your index in memory
then
>>>it is probably small enough that it won''t need to be stored
in memory.
>>>With the C version I''m working on the difference is only
about 20%-30%
>>>so not worth worrying about in my opinion.
>>>      
>>>
>>My situation is potentially different. The data I am storing is
text-based and
>>somewhat time sensitive. That is, the newest data is what most users
will be
>>interested in.
>>
>>However, I need to allow the ability to search for *all results* -- both
new
>>data and old. Once the database is large, then this "new data"
may be only 1%
>>or less of the overall database. The new data may consist of several
thousand
>>documents.
>>    
>>
>
>So if I do the math, you''re expecting to have several hundred
thousand
>documents? Ok, you''ve got my attention now.
>
>  
>
>>I''m wondering if it might be useful to store *all data* in a
disk-based index
>>while *also* storing the newest data in an in-memory index. This would
allow me
>>to offer faster results when searching only the new data (which is what
most
>>people will likely use) while still allowing people to search the entire
>>dataset if they want to.
>>    
>>
>
>In-memory or not, it will certainly be faster to search a smaller
>document set so splitting the index in two might not be a bad idea.
>Perhaps you could have a daily process which reindexes the recent
>document set.
>
>  
>
>>Of course, this is only a good idea if it provides a significantly
faster
>>response time for searching the in-memory index.
>>    
>>
>
>The in memory part won''t make the big difference. Having a smaller
>index might. I''d recommend doing the simplest thing possible and
>refactoring if necessary. It should''t be hard to add a second
>in-memory index later. Up to you though.
>
>Dave
>
>  
>
>>-k
>>_______________________________________________
>>Rails mailing list
>>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>>http://lists.rubyonrails.org/mailman/listinfo/rails
>>
>>    
>>
>_______________________________________________
>Rails mailing list
>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
>http://lists.rubyonrails.org/mailman/listinfo/rails
>
>  
>

David Balmain

2005-Dec-14 00:06 UTC

head link

Re: Ferret on rails question pt 2

On 12/14/05, Abdur-Rahman Advany
<rails-U5wbzIpkoVrQT0dZR+AlfA@public.gmane.org>
wrote:> What you could considere is using something like cacheAR for the latest
> queries or for popular queries..
I''m not really sure but I think you''d probably just use
cacheAR to
cache the popular documents. I don''t know if I mentioned already but I
haven''t had enough time to work with much of the rails stuff yet.
Soon. :-)
> David Balmain wrote:
>
> >On 12/14/05, Kevin Bedell
<kevin-EZfY3IQN+VlBDgjK7y7TUQ@public.gmane.org> wrote:
> >
> >
> >>Thanks - all this info is right on. Great!
> >>
> >>Quoting David Balmain
<dbalmain.ml-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>:
> >>
> >>
> >>>This is kind of a catch-22. If you can store your index in
memory then
> >>>it is probably small enough that it won''t need to be
stored in memory.
> >>>With the C version I''m working on the difference is
only about 20%-30%
> >>>so not worth worrying about in my opinion.
> >>>
> >>>
> >>My situation is potentially different. The data I am storing is
text-based and
> >>somewhat time sensitive. That is, the newest data is what most
users will be
> >>interested in.
> >>
> >>However, I need to allow the ability to search for *all results* --
both new
> >>data and old. Once the database is large, then this "new
data" may be only 1%
> >>or less of the overall database. The new data may consist of
several thousand
> >>documents.
> >>
> >>
> >
> >So if I do the math, you''re expecting to have several hundred
thousand
> >documents? Ok, you''ve got my attention now.
> >
> >
> >
> >>I''m wondering if it might be useful to store *all data* in
a disk-based index
> >>while *also* storing the newest data in an in-memory index. This
would allow me
> >>to offer faster results when searching only the new data (which is
what most
> >>people will likely use) while still allowing people to search the
entire
> >>dataset if they want to.
> >>
> >>
> >
> >In-memory or not, it will certainly be faster to search a smaller
> >document set so splitting the index in two might not be a bad idea.
> >Perhaps you could have a daily process which reindexes the recent
> >document set.
> >
> >
> >
> >>Of course, this is only a good idea if it provides a significantly
faster
> >>response time for searching the in-memory index.
> >>
> >>
> >
> >The in memory part won''t make the big difference. Having a
smaller
> >index might. I''d recommend doing the simplest thing possible
and
> >refactoring if necessary. It should''t be hard to add a second
> >in-memory index later. Up to you though.
> >
> >Dave
> >
> >
> >
> >>-k
> >>_______________________________________________
> >>Rails mailing list
> >>Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >>http://lists.rubyonrails.org/mailman/listinfo/rails
> >>
> >>
> >>
> >_______________________________________________
> >Rails mailing list
> >Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> >http://lists.rubyonrails.org/mailman/listinfo/rails
> >
> >
> >
>
> _______________________________________________
> Rails mailing list
> Rails-1W37MKcQCpIf0INCOvqR/iCwEArCW2h5@public.gmane.org
> http://lists.rubyonrails.org/mailman/listinfo/rails
>

Rails - Dec 2005 - ferret vs. mysql fulltext

ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Ferret on rails question

Ferret on rails question pt 2

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: Ferret on rails question pt 2

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: Ferret on rails question pt 2

Re: ferret vs. mysql fulltext

Re: ferret vs. mysql fulltext

Re: Ferret on rails question pt 2

Re: ferret vs. mysql fulltext

Re: Ferret on rails question pt 2

Re: ferret vs. mysql fulltext

Re: Ferret on rails question pt 2

Re: Ferret on rails question pt 2

Re: Ferret on rails question pt 2