Bob Hutchison
2006-Oct-07 12:57 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
Hi, I''ve listened in to this mail list for quite a while now but not doing anything with Ferret until I was ready to incorporate it. I''ve used Lucene for years, but not Ferret. I downloaded and installed the ''bleeding edge'' version (lets call it 0.10.9.1). There appears to be a significant re-working of the API happening. It all looks good. But there might be a couple of gaps still there. The first question: should I even consider using the 0.10.9.1 version of Ferret? What I intend to use it for will not be a critical component, at least for the time being. I''m also used to working with shifting software. The advantage that I see is the new API. Performance is a BIG issue with my project. The second question: are there any opinions regarding ease-of-upgrade from the current stable version to what is being worked on now. I don''t have anything to upgrade at the moment, but if I go with the stable version then I will have. The third question: it looks to me that in the 0.10.9.1 version the content of the fields is being stored in the index. For my application this is worse than a waste of time. Am I missing something. The fourth question: in a message from August 23 there was a hint of a write-up discussing the new API. Did this ever get published? I think there is some *very* nice work here. I''m looking forward to using Ferret. Cheers, Bob ---- Bob Hutchison -- blogs at <http://www.recursive.ca/ hutch/> Recursive Design Inc. -- <http://www.recursive.ca/> Raconteur -- <http://www.raconteur.info/> xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
David Balmain
2006-Oct-08 04:24 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
On 10/7/06, Bob Hutchison <hutch at recursive.ca> wrote:> Hi, > > I''ve listened in to this mail list for quite a while now but not > doing anything with Ferret until I was ready to incorporate it. I''ve > used Lucene for years, but not Ferret. > > I downloaded and installed the ''bleeding edge'' version (lets call it > 0.10.9.1). There appears to be a significant re-working of the API > happening. It all looks good. But there might be a couple of gaps > still there.I''m all ears. What do you think needs improvement?> The first question: should I even consider using the 0.10.9.1 version > of Ferret? What I intend to use it for will not be a critical > component, at least for the time being. I''m also used to working with > shifting software. The advantage that I see is the new API. > Performance is a BIG issue with my project.I''ve just release 0.10.10. Version 0.10.9 is probably the most stable version to date. 0.10.10 has some significant changes to improve performance of sorting and filtering of large unoptimized indexes (putting Ferret orders up to orders of magnitude ahead of Lucene for these tasks). In a few days we should know if I broke anything. There are currently only 3 outstanding tickets on Trac and they are only on Windows and OS X so if you are on Linux you should be fine.> The second question: are there any opinions regarding ease-of-upgrade > from the current stable version to what is being worked on now. I > don''t have anything to upgrade at the moment, but if I go with the > stable version then I will have.Well, 0.10.9 is the most stable version since the pure ruby version so that would be the version I go with. Also, I can usually fix most problems within a day or two if I can reproduce the problem or you are willing to give me ssh access to your server.> The third question: it looks to me that in the 0.10.9.1 version the > content of the fields is being stored in the index. For my > application this is worse than a waste of time. Am I missing something. >It depends how you set your index up. You specify which fields you want stored/indexed or term-vectorized (I know, it''s not a word). # set to not store fields by default field_infos = FieldInfos.new(:store => :no) # must store id field however field_infos.add_field(:id, :store => :yes, :index => :untokenized)> The fourth question: in a message from August 23 there was a hint of > a write-up discussing the new API. Did this ever get published?No. But I did update the documentation here: http://ferret.davebalmain.com/api/files/TUTORIAL.html You may even find the Ferret FAQ even better. http://ferret.davebalmain.com/trac/wiki/FAQ And there may be an O''Reilly "shortcut" coming out soon.> I think there is some *very* nice work here. I''m looking forward to > using Ferret.Great. Thanks, Dave
Bob Hutchison
2006-Oct-08 15:11 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
On 8-Oct-06, at 12:24 AM, David Balmain wrote:> On 10/7/06, Bob Hutchison <hutch at recursive.ca> wrote: >> Hi, >> >> I''ve listened in to this mail list for quite a while now but not >> doing anything with Ferret until I was ready to incorporate it. I''ve >> used Lucene for years, but not Ferret. >> >> I downloaded and installed the ''bleeding edge'' version (lets call it >> 0.10.9.1). There appears to be a significant re-working of the API >> happening. It all looks good. But there might be a couple of gaps >> still there. > > I''m all ears. What do you think needs improvement?It may simply be a misunderstanding on my part, read on. I also can''t figure out how to redefine the field used as an id (again, read on, the documented way isn''t working for me and probably because of what comes up below).> >> The first question: should I even consider using the 0.10.9.1 version >> of Ferret? What I intend to use it for will not be a critical >> component, at least for the time being. I''m also used to working with >> shifting software. The advantage that I see is the new API. >> Performance is a BIG issue with my project. > > I''ve just release 0.10.10. Version 0.10.9 is probably the most stable > version to date. 0.10.10 has some significant changes to improve > performance of sorting and filtering of large unoptimized indexes > (putting Ferret orders up to orders of magnitude ahead of Lucene for > these tasks). In a few days we should know if I broke anything. There > are currently only 3 outstanding tickets on Trac and they are only on > Windows and OS X so if you are on Linux you should be fine.Of course I''m running OS X... this couldn''t be easy :-) I''m also seeing issues 127 and 136 (like everyone else on OS X will be). Another thing for OS X, until Apple fixes their gcc4 compiler either use the gcc3 compiler or use -O1 rather than -O2. I changed the ext_conf file to do this, but the two OS X issue remain. If you don''t do this you will eventually get a corrupted heap (usually takes a while). I''ve had to recompile ruby to this optimisation level for it to work reliably.> >> The second question: are there any opinions regarding ease-of-upgrade >> from the current stable version to what is being worked on now. I >> don''t have anything to upgrade at the moment, but if I go with the >> stable version then I will have. > > Well, 0.10.9 is the most stable version since the pure ruby version so > that would be the version I go with. Also, I can usually fix most > problems within a day or two if I can reproduce the problem or you are > willing to give me ssh access to your server.Okay, I''m convinced. The most recent is the way to go.> >> The third question: it looks to me that in the 0.10.9.1 version the >> content of the fields is being stored in the index. For my >> application this is worse than a waste of time. Am I missing >> something. >> > > It depends how you set your index up. You specify which fields you > want stored/indexed or term-vectorized (I know, it''s not a word). > > # set to not store fields by default > field_infos = FieldInfos.new(:store => :no) > # must store id field however > field_infos.add_field(:id, :store => :yes, :index => :untokenized)So, I tried requiring ferret. It simply won''t admit to knowing anything about the FieldInfos class. How bad are those two remaining OS X bugs? So, I tried requiring rferret. That worked better. I tried your example (actually I tried this before posting and this is why I said I thought I saw a few gaps). It doesn''t work for me. The initialize method for FieldInfos is defined as: def initialize(dir = nil, name = nil) @fi_array = [] @fi_hash = {} if dir and dir.exists?(name) The options in your example are assigned to the dir and an exists? method is undefined on a hash and so a method missing exception is thrown. I''ve happily forgotten most of my C code, but it looks as though the C version is doing something similar (not that it matters in my case because FieldInfos is invisible)> >> The fourth question: in a message from August 23 there was a hint of >> a write-up discussing the new API. Did this ever get published? > > No. But I did update the documentation here: > > http://ferret.davebalmain.com/api/files/TUTORIAL.htmlI thought that was the old way since I couldn''t get it to work (see above).> > You may even find the Ferret FAQ even better. > > http://ferret.davebalmain.com/trac/wiki/FAQI don''t know how I missed that. Thanks.> > And there may be an O''Reilly "shortcut" coming out soon.That''s great! Cheers, Bob> >> I think there is some *very* nice work here. I''m looking forward to >> using Ferret. > > Great. Thanks, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk---- Bob Hutchison -- blogs at <http://www.recursive.ca/ hutch/> Recursive Design Inc. -- <http://www.recursive.ca/> Raconteur -- <http://www.raconteur.info/> xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
Bob Hutchison
2006-Oct-08 15:32 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
It looks as though I somehow got the wrong version out of subversion. Hold on while I do this again. Sorry about that. Bob On 8-Oct-06, at 11:11 AM, Bob Hutchison wrote:> > On 8-Oct-06, at 12:24 AM, David Balmain wrote: > >> On 10/7/06, Bob Hutchison <hutch at recursive.ca> wrote: >>> Hi, >>> >>> I''ve listened in to this mail list for quite a while now but not >>> doing anything with Ferret until I was ready to incorporate it. I''ve >>> used Lucene for years, but not Ferret. >>> >>> I downloaded and installed the ''bleeding edge'' version (lets call it >>> 0.10.9.1). There appears to be a significant re-working of the API >>> happening. It all looks good. But there might be a couple of gaps >>> still there. >> >> I''m all ears. What do you think needs improvement? > > It may simply be a misunderstanding on my part, read on. I also > can''t figure out how to redefine the field used as an id (again, > read on, the documented way isn''t working for me and probably > because of what comes up below). > >> >>> The first question: should I even consider using the 0.10.9.1 >>> version >>> of Ferret? What I intend to use it for will not be a critical >>> component, at least for the time being. I''m also used to working >>> with >>> shifting software. The advantage that I see is the new API. >>> Performance is a BIG issue with my project. >> >> I''ve just release 0.10.10. Version 0.10.9 is probably the most stable >> version to date. 0.10.10 has some significant changes to improve >> performance of sorting and filtering of large unoptimized indexes >> (putting Ferret orders up to orders of magnitude ahead of Lucene for >> these tasks). In a few days we should know if I broke anything. There >> are currently only 3 outstanding tickets on Trac and they are only on >> Windows and OS X so if you are on Linux you should be fine. > > Of course I''m running OS X... this couldn''t be easy :-) I''m also > seeing issues 127 and 136 (like everyone else on OS X will be). > Another thing for OS X, until Apple fixes their gcc4 compiler > either use the gcc3 compiler or use -O1 rather than -O2. I changed > the ext_conf file to do this, but the two OS X issue remain. If you > don''t do this you will eventually get a corrupted heap (usually > takes a while). I''ve had to recompile ruby to this optimisation > level for it to work reliably. > >> >>> The second question: are there any opinions regarding ease-of- >>> upgrade >>> from the current stable version to what is being worked on now. I >>> don''t have anything to upgrade at the moment, but if I go with the >>> stable version then I will have. >> >> Well, 0.10.9 is the most stable version since the pure ruby >> version so >> that would be the version I go with. Also, I can usually fix most >> problems within a day or two if I can reproduce the problem or you >> are >> willing to give me ssh access to your server. > > Okay, I''m convinced. The most recent is the way to go. > >> >>> The third question: it looks to me that in the 0.10.9.1 version the >>> content of the fields is being stored in the index. For my >>> application this is worse than a waste of time. Am I missing >>> something. >>> >> >> It depends how you set your index up. You specify which fields you >> want stored/indexed or term-vectorized (I know, it''s not a word). >> >> # set to not store fields by default >> field_infos = FieldInfos.new(:store => :no) >> # must store id field however >> field_infos.add_field(:id, :store => :yes, :index >> => :untokenized) > > So, I tried requiring ferret. It simply won''t admit to knowing > anything about the FieldInfos class. How bad are those two > remaining OS X bugs? > > So, I tried requiring rferret. That worked better. > > I tried your example (actually I tried this before posting and this > is why I said I thought I saw a few gaps). It doesn''t work for me. > The initialize method for FieldInfos is defined as: > > def initialize(dir = nil, name = nil) > @fi_array = [] > @fi_hash = {} > if dir and dir.exists?(name) > > The options in your example are assigned to the dir and an exists? > method is undefined on a hash and so a method missing exception is > thrown. > > I''ve happily forgotten most of my C code, but it looks as though > the C version is doing something similar (not that it matters in my > case because FieldInfos is invisible) > >> >>> The fourth question: in a message from August 23 there was a hint of >>> a write-up discussing the new API. Did this ever get published? >> >> No. But I did update the documentation here: >> >> http://ferret.davebalmain.com/api/files/TUTORIAL.html > > I thought that was the old way since I couldn''t get it to work (see > above). > >> >> You may even find the Ferret FAQ even better. >> >> http://ferret.davebalmain.com/trac/wiki/FAQ > > I don''t know how I missed that. Thanks. > >> >> And there may be an O''Reilly "shortcut" coming out soon. > > That''s great! > > Cheers, > Bob > >> >>> I think there is some *very* nice work here. I''m looking forward to >>> using Ferret. >> >> Great. Thanks, >> Dave >> _______________________________________________ >> Ferret-talk mailing list >> Ferret-talk at rubyforge.org >> http://rubyforge.org/mailman/listinfo/ferret-talk > > ---- > Bob Hutchison -- blogs at <http://www.recursive.ca/ > hutch/> > Recursive Design Inc. -- <http://www.recursive.ca/> > Raconteur -- <http://www.raconteur.info/> > xampl for Ruby -- <http://rubyforge.org/projects/ > xampl/> > > >---- Bob Hutchison -- blogs at <http://www.recursive.ca/ hutch/> Recursive Design Inc. -- <http://www.recursive.ca/> Raconteur -- <http://www.raconteur.info/> xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
Bob Hutchison
2006-Oct-08 15:55 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
On 8-Oct-06, at 11:32 AM, Bob Hutchison wrote:> It looks as though I somehow got the wrong version out of > subversion. Hold on while I do this again. Sorry about that.That is what happened, sorry for the noise. The 0.10.10 version is running at least 225 times faster. And the tutorial works. Sigh. (I got the version I was working from with this command: svn checkout svn://davebalmain.com/ferret/trunk ferret and I don''t remember where I got that from) Well, I''m comfortably set. Cheers, Bob ---- Bob Hutchison -- blogs at <http://www.recursive.ca/ hutch/> Recursive Design Inc. -- <http://www.recursive.ca/> Raconteur -- <http://www.raconteur.info/> xampl for Ruby -- <http://rubyforge.org/projects/xampl/>
David Balmain
2006-Oct-09 02:44 UTC
[Ferret-talk] How to proceed with incorporating Ferret?
On 10/9/06, Bob Hutchison <hutch at recursive.ca> wrote:> > On 8-Oct-06, at 11:32 AM, Bob Hutchison wrote: > > > It looks as though I somehow got the wrong version out of > > subversion. Hold on while I do this again. Sorry about that. > > That is what happened, sorry for the noise. The 0.10.10 version is > running at least 225 times faster. And the tutorial works. Sigh. > > (I got the version I was working from with this command: > > svn checkout svn://davebalmain.com/ferret/trunk ferret > > and I don''t remember where I got that from) > > Well, I''m comfortably set. > > Cheers, > Bob >Sorry that was my fault. The current version of Ferret is in a different repository: svn co svn://www.davebalmain.com/exp ferret The reason for this was that the curretn version started out as an experimental version where I was trying a few things out and ended out being a complete rewrite with different file format and all. I still have to roll it into the original ferret repository. Dave