David Balmain
2006-Mar-19 12:11 UTC
[Rails] [ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby)
Hi Folks, I''ve just released version 0.9.0. This latest version of Ferret is an alpha release. I have removed the old c extension and Ferret is now running on a fully ported C library. This has allowed some huge performance improvements both with regard to memory and CPU usage. There will probably be a few portability issues to start with. It has been developed on Linux so it should work fine there. Windows and Mac users beware. Also, the current version doesn''t allow you to extend Ferret. For example, you can''t write your own analyzer or filter. This will be rectified in the near future. http://ferret.davebalmain.com/trac/ Dave Balmain == Description Ferret is a full port of the Apache Lucene searching and indexing library. It''s available as a gem so try it out! To get started quickly read the quick start at the project homepage; http://ferret.davebalmain.com/api http://ferret.davebalmain.com/api/files/TUTORIAL.html == Changes * currently this version isn''t very extendable. For example, you can''t write your own Analyzer, Filter or Query. * changed Token#term_text to Token#text * changed Token#position_increment to Term#pos_inc * changed order of args to Token.new. Now Term.new(text, start_offset, end_offset, pos_inc=1, type="text"). NOTE: type does nothing. * changed TermVectorOffsetInfo#start_offset to TermVectorOffsetInfo#start * changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end * added :id_field option to Index::Index class.
Onur Turgay
2006-Mar-28 11:31 UTC
[Rails] Re: [ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby)
hi david, I installed 0.9.0 to a heavily busy webserver (100k pagevisits/day) and its working flawlessly (at least it seems so :) ).. But I have a major problem. Now ferret doesnt index nor search unicode turkish characters. I was using StandardAnalyzer in 0.3.2 and it was working fine; because w+ RegExp statement was somehow working with turkish charset (UTF-8) (in normal conditions it shouldnt be; but I am luck I think :) ). Now is there a way that I can make ferret work with unicode again or should I stick to 0.3.2 thanks in advance, thanks for great work. onur David Balmain wrote:> Hi Folks, > > I''ve just released version 0.9.0. This latest version of Ferret is an > alpha release. I have removed the old c extension and Ferret is now > running on a fully ported C library. This has allowed some huge > performance improvements both with regard to memory and CPU usage. > > There will probably be a few portability issues to start with. It has > been developed on Linux so it should work fine there. Windows and Mac > users beware. > > Also, the current version doesn''t allow you to extend Ferret. For > example, you can''t write your own analyzer or filter. This will be > rectified in the near future. > > http://ferret.davebalmain.com/trac/ > > Dave Balmain > > == Description > > Ferret is a full port of the Apache Lucene searching and indexing > library. It''s available as a gem so try it out! To get started quickly > read the quick start at the project homepage; > > http://ferret.davebalmain.com/api > http://ferret.davebalmain.com/api/files/TUTORIAL.html > > == Changes > > * currently this version isn''t very extendable. For example, > you can''t write your own Analyzer, Filter or Query. > * changed Token#term_text to Token#text > * changed Token#position_increment to Term#pos_inc > * changed order of args to Token.new. Now Term.new(text, start_offset, > end_offset, pos_inc=1, type="text"). NOTE: type does nothing. > * changed TermVectorOffsetInfo#start_offset to TermVectorOffsetInfo#start > * changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end > * added :id_field option to Index::Index class.
David Balmain
2006-Mar-28 13:04 UTC
[Rails] Re: [ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby)
Hi Onur, I''m trying to solve this problem right now. You had better stick with 0.3.2 for the moment but better analyzer support is on it''s way. I''m still trying to decide whether to include Oniguruma (the future Ruby regexp library) with ferret or just use the current regex library comes with Ruby. ferret-0.9.1 should have UTF-8 support. Cheers, Dave On 3/28/06, Onur Turgay <onurturgay@labristeknoloji.com> wrote:> hi david, > I installed 0.9.0 to a heavily busy webserver (100k pagevisits/day) and > its working flawlessly (at least it seems so :) ).. But I have a major > problem. Now ferret doesnt index nor search unicode turkish characters. > I was using StandardAnalyzer in 0.3.2 and it was working fine; because > w+ RegExp statement was somehow working with turkish charset (UTF-8) (in > normal conditions it shouldnt be; but I am luck I think :) ). > > Now is there a way that I can make ferret work with unicode again or > should I stick to 0.3.2 > > thanks in advance, thanks for great work. > onur > > David Balmain wrote: > > Hi Folks, > > > > I''ve just released version 0.9.0. This latest version of Ferret is an > > alpha release. I have removed the old c extension and Ferret is now > > running on a fully ported C library. This has allowed some huge > > performance improvements both with regard to memory and CPU usage. > > > > There will probably be a few portability issues to start with. It has > > been developed on Linux so it should work fine there. Windows and Mac > > users beware. > > > > Also, the current version doesn''t allow you to extend Ferret. For > > example, you can''t write your own analyzer or filter. This will be > > rectified in the near future. > > > > http://ferret.davebalmain.com/trac/ > > > > Dave Balmain > > > > == Description > > > > Ferret is a full port of the Apache Lucene searching and indexing > > library. It''s available as a gem so try it out! To get started quickly > > read the quick start at the project homepage; > > > > http://ferret.davebalmain.com/api > > http://ferret.davebalmain.com/api/files/TUTORIAL.html > > > > == Changes > > > > * currently this version isn''t very extendable. For example, > > you can''t write your own Analyzer, Filter or Query. > > * changed Token#term_text to Token#text > > * changed Token#position_increment to Term#pos_inc > > * changed order of args to Token.new. Now Term.new(text, start_offset, > > end_offset, pos_inc=1, type="text"). NOTE: type does nothing. > > * changed TermVectorOffsetInfo#start_offset to TermVectorOffsetInfo#start > > * changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end > > * added :id_field option to Index::Index class. > > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >
Onur Turgay
2006-Mar-29 10:43 UTC
[Rails] Re: [ANN] Ferret 0.9.0-alpha (port of Apache Lucene to pure ruby)
thanks david, keep up the great work. David Balmain wrote:> Hi Onur, > > I''m trying to solve this problem right now. You had better stick with > 0.3.2 for the moment but better analyzer support is on it''s way. I''m > still trying to decide whether to include Oniguruma (the future Ruby > regexp library) with ferret or just use the current regex library > comes with Ruby. ferret-0.9.1 should have UTF-8 support. > > Cheers, > Dave > > On 3/28/06, Onur Turgay <onurturgay@labristeknoloji.com> wrote: >> hi david, >> I installed 0.9.0 to a heavily busy webserver (100k pagevisits/day) and >> its working flawlessly (at least it seems so :) ).. But I have a major >> problem. Now ferret doesnt index nor search unicode turkish characters. >> I was using StandardAnalyzer in 0.3.2 and it was working fine; because >> w+ RegExp statement was somehow working with turkish charset (UTF-8) (in >> normal conditions it shouldnt be; but I am luck I think :) ). >> >> Now is there a way that I can make ferret work with unicode again or >> should I stick to 0.3.2 >> >> thanks in advance, thanks for great work. >> onur >> >> David Balmain wrote: >>> Hi Folks, >>> >>> I''ve just released version 0.9.0. This latest version of Ferret is an >>> alpha release. I have removed the old c extension and Ferret is now >>> running on a fully ported C library. This has allowed some huge >>> performance improvements both with regard to memory and CPU usage. >>> >>> There will probably be a few portability issues to start with. It has >>> been developed on Linux so it should work fine there. Windows and Mac >>> users beware. >>> >>> Also, the current version doesn''t allow you to extend Ferret. For >>> example, you can''t write your own analyzer or filter. This will be >>> rectified in the near future. >>> >>> http://ferret.davebalmain.com/trac/ >>> >>> Dave Balmain >>> >>> == Description >>> >>> Ferret is a full port of the Apache Lucene searching and indexing >>> library. It''s available as a gem so try it out! To get started quickly >>> read the quick start at the project homepage; >>> >>> http://ferret.davebalmain.com/api >>> http://ferret.davebalmain.com/api/files/TUTORIAL.html >>> >>> == Changes >>> >>> * currently this version isn''t very extendable. For example, >>> you can''t write your own Analyzer, Filter or Query. >>> * changed Token#term_text to Token#text >>> * changed Token#position_increment to Term#pos_inc >>> * changed order of args to Token.new. Now Term.new(text, start_offset, >>> end_offset, pos_inc=1, type="text"). NOTE: type does nothing. >>> * changed TermVectorOffsetInfo#start_offset to TermVectorOffsetInfo#start >>> * changed TermVectorOffsetInfo#end_offset to TermVectorOffsetInfo#end >>> * added :id_field option to Index::Index class. >> _______________________________________________ >> Rails mailing list >> Rails@lists.rubyonrails.org >> http://lists.rubyonrails.org/mailman/listinfo/rails >>