Hi All, Hope all is going well. Was just wondering if anyone has implemented a grep style output page of hits using Ferret as the index/query engine? Any thoughts about how best to implement it? The previous thread discussess highlighting - would that be the best approach to follow or is there a better way? Cheers, Marcus -- Posted via http://www.ruby-forum.com/.
On 6/13/06, Marcus Crafter <crafterm at gmail.com> wrote:> Hi All, > > Hope all is going well. Was just wondering if anyone has implemented a > grep style output page of hits using Ferret as the index/query engine? > > Any thoughts about how best to implement it? The previous thread > discussess highlighting - would that be the best approach to follow or > is there a better way? > > Cheers, > > MarcusHi Marcus, If you can read java the best way would be to check out the highlighter in Apache Lucene and porting that code to Ruby. You can see the highlighter module here; http://svn.apache.org/viewvc/lucene/java/trunk/contrib/ I''m going to do this myself eventually but you''ll have to do it yourself if you need it soon. Before you put too much work into it though, be warned that there are possible major Ferret API changes ahead. Cheers, Dave
David Balmain wrote:> On 6/13/06, Marcus Crafter <crafterm at gmail.com> wrote: > Hi Marcus, > > If you can read java the best way would be to check out the > highlighter in Apache Lucene and porting that code to Ruby. You can > see the highlighter module here; > > http://svn.apache.org/viewvc/lucene/java/trunk/contrib/ > > I''m going to do this myself eventually but you''ll have to do it > yourself if you need it soon. Before you put too much work into it > though, be warned that there are possible major Ferret API changes > ahead.Hi David, Thanks for your response. I noticed in a previous post you referenced the lucene highlighter and have already started porting it to Ferret. I''m already quite a ways along and have got the first 3 test cases passing properly (ie. simple and fuzzy fragments) and will continue with getting the rest of the test cases to work. Hopefully the API changes don''t break too much then :) I''ll post the code once it''s all working, hopefully within the next days. Cheers, Marcus -- Posted via http://www.ruby-forum.com/.
On 6/21/06, Marcus Crafter <crafterm at gmail.com> wrote:> David Balmain wrote: > > On 6/13/06, Marcus Crafter <crafterm at gmail.com> wrote: > > Hi Marcus, > > > > If you can read java the best way would be to check out the > > highlighter in Apache Lucene and porting that code to Ruby. You can > > see the highlighter module here; > > > > http://svn.apache.org/viewvc/lucene/java/trunk/contrib/ > > > > I''m going to do this myself eventually but you''ll have to do it > > yourself if you need it soon. Before you put too much work into it > > though, be warned that there are possible major Ferret API changes > > ahead. > > Hi David, > > Thanks for your response. > > I noticed in a previous post you referenced the lucene highlighter and > have already started porting it to Ferret. I''m already quite a ways > along and have got the first 3 test cases passing properly (ie. simple > and fuzzy fragments) and will continue with getting the rest of the test > cases to work. > > Hopefully the API changes don''t break too much then :) > > I''ll post the code once it''s all working, hopefully within the next > days. > > Cheers, > > MarcusThat''d be great. The new API shouldn''t be too hard to adjust to. I''ll be implementing the highlighter in C rather than in Ruby so I''ll be interested to see how you go with it. The main difference in the API is that you won''t specify the store, index and term_vector parameters per document field any more. This option will still be available but the behaviour will be slightly different. I''ll go into more detail later. Cheers, Dave
On Jun 21, 2006, at 3:32 AM, David Balmain wrote:> I''ll > be implementing the highlighter in C rather than in Ruby so I''ll be > interested to see how you go with it. > > The main difference in the API is that you won''t specify the store, > index and term_vector parameters per document field any more. This > option will still be available but the behaviour will be slightly > different. I''ll go into more detail later.How close is what you''re going to be doing to the Lucene contrib highlighter? FWIW, the KinoSearch Highlighter uses similar techniques for adding tags and encoding, but the excerpt selection is pretty different. No TokenStream required, it uses a heat map. Right now it requires that the field have term vectors stored with positions and offsets, but it could be adapted to generate the vectors by re-analyzing. The principle advantage it has over the Lucene Highlighter in that it handles phrases properly: http://xrl.us/nm2z (Link to www.lucenebook.com) http://xrl.us/nm25 (Link to www.rectangular.com) Whatever algorithm we choose for Lucy, I hope it will meet that constraint. Higlighter.pm isn''t that long (384 lines including docs) and if I didn''t have an serious deadlines bearing down doing a Ruby version would be a great exercise for me. If you or Marcus want to check it out, the new version''s only in subversion: http://xrl.us/nm28 (Link to www.rectangular.com) Marvin Humphrey Rectangular Research http://www.rectangular.com/
On 6/21/06, Marvin Humphrey <marvin at rectangular.com> wrote:> > On Jun 21, 2006, at 3:32 AM, David Balmain wrote: > > > I''ll > > be implementing the highlighter in C rather than in Ruby so I''ll be > > interested to see how you go with it. > > > > The main difference in the API is that you won''t specify the store, > > index and term_vector parameters per document field any more. This > > option will still be available but the behaviour will be slightly > > different. I''ll go into more detail later. > > How close is what you''re going to be doing to the Lucene contrib > highlighter?Well I haven''t actually started it yet so we''ll see.> FWIW, the KinoSearch Highlighter uses similar techniques for adding > tags and encoding, but the excerpt selection is pretty different. No > TokenStream required, it uses a heat map. Right now it requires that > the field have term vectors stored with positions and offsets, but it > could be adapted to generate the vectors by re-analyzing. > > The principle advantage it has over the Lucene Highlighter in that it > handles phrases properly: > > http://xrl.us/nm2z (Link to www.lucenebook.com) > http://xrl.us/nm25 (Link to www.rectangular.com) > > Whatever algorithm we choose for Lucy, I hope it will meet that > constraint. > > Higlighter.pm isn''t that long (384 lines including docs) and if I > didn''t have an serious deadlines bearing down doing a Ruby version > would be a great exercise for me. If you or Marcus want to check it > out, the new version''s only in subversion: > > http://xrl.us/nm28 (Link to www.rectangular.com)Cool, I''ll definitely check this out. Thanks Marvin.