I''ve been working on a pet project and have just started implementing full-text searching with acts_as_xapian. It''s working pretty well but I''m having trouble getting some of the bells and whistles to work. First is the spelling correction. If it feels that there are incorrectly spelt words, it provides an array of the correct spellings, but without reference to which words it is correcting. So if I enter the search term "the cat adn the dog", it will give an array ["and"] which is useless in a gsub because it can''t tell what it should be replacing. I want to be able to say "did you mean ''the cat *and* the dog''?" but I can''t work out how to manipulate the string. The second puzzle is regarding highlighting the search terms. When you follow a link to one of the results, it appends the search term to the query and uses TextHelper::highlighter to mark those words. The problem is that it is expecting an array, not a string. So I split the string by spaces, but what about parts of the query that were enclosed in quotes? I have found it impossible to mangle a complex query such as: "null pointer" undefined "static char array" So it can be passed a query parameter and then decoded again for the highlighter. I''ve tried all sorts of regexp, splits and joins but it''s just given me a headache. I know people have done this before so I''m hoping someone can give me some pointers. Let me know if I can provide any more information to explain myself better. Many thanks Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Tue, Dec 08, 2009 at 03:02:39PM +0000, Matt Harrison wrote:> I''ve been working on a pet project and have just started implementing > full-text searching with acts_as_xapian. It''s working pretty well but I''m > having trouble getting some of the bells and whistles to work. > > First is the spelling correction. If it feels that there are incorrectly > spelt words, it provides an array of the correct spellings, but without > reference to which words it is correcting. > > So if I enter the search term "the cat adn the dog", it will give an array > ["and"] which is useless in a gsub because it can''t tell what it should be > replacing. I want to be able to say "did you mean ''the cat *and* the dog''?" > but I can''t work out how to manipulate the string. > > The second puzzle is regarding highlighting the search terms. When you > follow a link to one of the results, it appends the search term to the query > and uses TextHelper::highlighter to mark those words. The problem is that it > is expecting an array, not a string. So I split the string by spaces, but > what about parts of the query that were enclosed in quotes? > > I have found it impossible to mangle a complex query such as: > > "null pointer" undefined "static char array" > > So it can be passed a query parameter and then decoded again for the > highlighter. I''ve tried all sorts of regexp, splits and joins but it''s just > given me a headache. > > I know people have done this before so I''m hoping someone can give me some > pointers. Let me know if I can provide any more information to explain > myself better. > > Many thanks > > MattOk I think I spoke too soon. Even after rebuilding and updating the indicies several times, fulltext searching doesn''t manage to search the entire body of text, only the first few lines. Investigation shows that the Xapain google groups list is almost pure spam and isn''t active at all. I guess I''ve chosen a duff technology to use, so I''ll need to switch. Can anyone suggest the current favorite for fulltext searching? I don''t really care about the spelling correction or highlighting of search terms (it interferes with my caching), just a simple search. Thanks Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
Take a look at ferret as a starting point. If you need more "oomph" you can move to either sphinx or solr. The "Advanced Rails Recipes" book has examples of all 3. -----Original Message----- From: rubyonrails-talk+owner-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org [mailto:rubyonrails-talk+owner-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org] On Behalf Of Matt Harrison Sent: Tuesday, December 08, 2009 2:53 PM To: rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org Subject: Re: [Rails] mangling search terms On Tue, Dec 08, 2009 at 03:02:39PM +0000, Matt Harrison wrote:> I''ve been working on a pet project and have just started implementing > full-text searching with acts_as_xapian. It''s working pretty well but I''m > having trouble getting some of the bells and whistles to work. > > First is the spelling correction. If it feels that there are incorrectly > spelt words, it provides an array of the correct spellings, but without > reference to which words it is correcting. > > So if I enter the search term "the cat adn the dog", it will give an array > ["and"] which is useless in a gsub because it can''t tell what it should be > replacing. I want to be able to say "did you mean ''the cat *and* thedog''?"> but I can''t work out how to manipulate the string. > > The second puzzle is regarding highlighting the search terms. When you > follow a link to one of the results, it appends the search term to thequery> and uses TextHelper::highlighter to mark those words. The problem is thatit> is expecting an array, not a string. So I split the string by spaces, but > what about parts of the query that were enclosed in quotes? > > I have found it impossible to mangle a complex query such as: > > "null pointer" undefined "static char array" > > So it can be passed a query parameter and then decoded again for the > highlighter. I''ve tried all sorts of regexp, splits and joins but it''sjust> given me a headache. > > I know people have done this before so I''m hoping someone can give me some > pointers. Let me know if I can provide any more information to explain > myself better. > > Many thanks > > MattOk I think I spoke too soon. Even after rebuilding and updating the indicies several times, fulltext searching doesn''t manage to search the entire body of text, only the first few lines. Investigation shows that the Xapain google groups list is almost pure spam and isn''t active at all. I guess I''ve chosen a duff technology to use, so I''ll need to switch. Can anyone suggest the current favorite for fulltext searching? I don''t really care about the spelling correction or highlighting of search terms (it interferes with my caching), just a simple search. Thanks Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en. -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Tue, Dec 08, 2009 at 02:56:59PM -0800, Joe McGlynn wrote:> Take a look at ferret as a starting point. If you need more "oomph" you can > move to either sphinx or solr. The "Advanced Rails Recipes" book has > examples of all 3.Thanks i''ll take a look. Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Tue, Dec 08, 2009 at 02:56:59PM -0800, Joe McGlynn wrote:> Take a look at ferret as a starting point. If you need more "oomph" you can > move to either sphinx or solr. The "Advanced Rails Recipes" book has > examples of all 3.Excellent, I managed to replace xapian with ferret and have it searching properly (it seems) within about 10 minutes. There''s still one thing I would like to do if possible but it seems I might be out of luck. This is for a custom made CMS with many pages. Each page uses fragment caching which expires when a page is edited. Bearing that in mind, how can I implement search term highlighting? I almost got it working with xapian before I realised that the caching would conflict with it. I understand that it might not be possible but it would be nice. Maybe I could just have it highlight the words on the results page, rather than on the page itself. I''m also looking for a way to display an exerpt of the sentence containing the search terms but for now I''ll just show the page title. Any more help from you guys is greatly appreciated. Thanks Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Dec 9, 6:04 pm, Matt Harrison <iwasinnamuk...-ja4MoDZtUtVl57MIdRCFDg@public.gmane.org> wrote:> Excellent, I managed to replace xapian with ferret and have it searching > properly (it seems) within about 10 minutes.Glad that you now have a working solution for you, but just for the record (and for future searchers), although the acts_as_xapian project seems to have been unmaintained since July, there are two other active projects building Ruby wrappers on top of Xapian: xapit (http:// github.com/ryanb/xapit/blob/master/README.rdoc) and xapian-fu (http:// github.com/johnl/xapian-fu). We maintain a listing of current wrappers in the Xapian wiki at http://trac.xapian.org/wiki/FAQ/RubyWrappers I don''t know why acts_as_xapian was having problems giving you spelling corrections in a useful way: Xapian''s interface returns the suggested spell-corrected query, which seemed to be what you wanted, so I don''t know why acts_as_xapian wasn''t doing this. -- Richard -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.
On Fri, Dec 11, 2009 at 09:19:25AM -0800, Richard Boulton wrote:> On Dec 9, 6:04?pm, Matt Harrison <iwasinnamuk...-ja4MoDZtUtVl57MIdRCFDg@public.gmane.org> wrote: > > Excellent, I managed to replace xapian with ferret and have it searching > > properly (it seems) within about 10 minutes. > > Glad that you now have a working solution for you, but just for the > record (and for future searchers), although the acts_as_xapian project > seems to have been unmaintained since July, there are two other active > projects building Ruby wrappers on top of Xapian: xapit (http:// > github.com/ryanb/xapit/blob/master/README.rdoc) and xapian-fu (http:// > github.com/johnl/xapian-fu). We maintain a listing of current > wrappers in the Xapian wiki at http://trac.xapian.org/wiki/FAQ/RubyWrappers > > I don''t know why acts_as_xapian was having problems giving you > spelling corrections in a useful way: Xapian''s interface returns the > suggested spell-corrected query, which seemed to be what you wanted, > so I don''t know why acts_as_xapian wasn''t doing this.Thanks for the reply, unfortunately I''m now having problems with ferret (see my other recent post) and ruby 1.9.1 so I might end up looking at Solr and Sunspot if I can''t resolve it. As for xapian, I''m sure there are implementations that work, but I''m not sure if I''ll try it again, I don''t really know why. As for the spelling, it''s interesting that xapian itself returns a corrected version of the entire query. that would have worked perfectly for me, unfortunately, acts_as_xapian only returned the single corrected word which made it far less useful. Well thanks for taking the time to reply but I think this thread is redunant now until I can fix ferret or decide to move over to Solr or something else. Thanks Matt -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.