Jean-Christophe Michel
2006-Aug-27 22:09 UTC
[Ferret-talk] how to get the words of a query
Hi, Using aaf to search pages, I wanted to present excerpts from the texts even when more than one term was used in the search. I came to some results, despite the difficulty caused by Unicode+ruby. The last problem I''m faced is to get the query words, without the logical articulation chars if any. Is there a clean way to get them ? -- Jean-Christophe Michel
On Mon, Aug 28, 2006 at 12:09:27AM +0200, Jean-Christophe Michel wrote:> Hi, > > Using aaf to search pages, I wanted to present excerpts from the texts > even when more than one term was used in the search. > I came to some results, despite the difficulty caused by Unicode+ruby. > The last problem I''m faced is to get the query words, without the > logical articulation chars if any. > Is there a clean way to get them ?in Ferret 0.10 there''s a highlight method in the Searcher class. Maybe that does what you want ? Jens http://ferret.davebalmain.com/api/classes/Ferret/Search/Searcher.html#M000223 -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jean-Christophe Michel
2006-Aug-28 12:11 UTC
[Ferret-talk] how to get the words of a query
Hi, Le 28 ao?t 06, ? 12:22, Jens Kraemer a ?crit :> in Ferret 0.10 there''s a highlight method in the Searcher class. Maybe > that does what you want ? > > http://ferret.davebalmain.com/api/classes/Ferret/Search/ > Searcher.html#M000223Seems good, will be perfect if your truncate respects multi-byte chars. My ruby helper does it, see how it works on http://symetrie.com/fr/search (it highlights only the first occurence of each word currently). I cannot find the whole code in http://ferret.davebalmain.com/trac/browser/tags/REL-0.10.1/ext/ r_search.c though to check this (and would I, I''m not sure I could check unicode implementation in C :/) I''ll maybe wait for aaf to be updated for 0.10.1 before testing. Jean-Christophe Michel -- Sym?trie, ?dition de musique et services multim?dia 30 rue Jean-Baptiste Say 69001 LYON (FRANCE) t?l +33 (0)478 29 52 14 fax +33 (0)478 30 01 11 web www.symetrie.com
On Mon, Aug 28, 2006 at 02:11:26PM +0200, Jean-Christophe Michel wrote:> Hi, > > Le 28 ao?t 06, ? 12:22, Jens Kraemer a ?crit : > > in Ferret 0.10 there''s a highlight method in the Searcher class. Maybe > > that does what you want ? > > > > http://ferret.davebalmain.com/api/classes/Ferret/Search/ > > Searcher.html#M000223 > > Seems good, will be perfect if your truncate respects multi-byte chars. > My ruby helper does it, see how it works on > http://symetrie.com/fr/search > (it highlights only the first occurence of each word currently). > > I cannot find the whole code in > http://ferret.davebalmain.com/trac/browser/tags/REL-0.10.1/ext/ > r_search.c > though to check this (and would I, I''m not sure I could check unicode > implementation in C :/) > > I''ll maybe wait for aaf to be updated for 0.10.1 before testing.the current trunk of aaf is supposed to be 0.10.x compatible, feel free to try it out. But for now you''d have to create your own Searcher to use the highlighting, because aaf doesn''t give you access to a Searcher instance to use. But this might be an interesting feature. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jean-Christophe Michel
2006-Aug-28 22:35 UTC
[Ferret-talk] how to get the words of a query
Hi, Thks for your reply. Le 28 ao?t 06, ? 23:52, Jens Kraemer a ?crit :> the current trunk of aaf is supposed to be 0.10.x compatible, feel free > to try it out.Ah, I''ll try.> But for now you''d have to create your own Searcher to use the > highlighting, because aaf doesn''t give you access to a Searcher > instance > to use. But this might be an interesting feature.If I wanted to use the parsed query words, is there a way to get them through aaf ? I currently use a hack: @query = params[:query].chars.gsub(/[^\w\s]/, '' '').strip.downcase (chars comes from unicode_hacks) If I don''t filter chars like ''&'', it makes my server down :/ (memory error in mongrel). Jean-Christophe Michel -- Sym?trie, ?dition de musique et services multim?dia 30 rue Jean-Baptiste Say 69001 LYON (FRANCE) t?l +33 (0)478 29 52 14 fax +33 (0)478 30 01 11 web www.symetrie.com
On 8/28/06, Jean-Christophe Michel <jc.michel at symetrie.com> wrote:> Hi, > > Le 28 ao?t 06, ? 12:22, Jens Kraemer a ?crit : > > in Ferret 0.10 there''s a highlight method in the Searcher class. Maybe > > that does what you want ? > > > > http://ferret.davebalmain.com/api/classes/Ferret/Search/ > > Searcher.html#M000223 > > Seems good, will be perfect if your truncate respects multi-byte chars. > My ruby helper does it, see how it works on > http://symetrie.com/fr/search > (it highlights only the first occurence of each word currently).Hi Jean-Christophe, Are you saying the highlight doesn''t respect multi-byte characters? If so, could you give an example? The highlighter uses the byte boundaries returned by the analyzer during indexing so I can''t see any reason multi-byte characters wouldn''t be respected. Also, it''s quite a bit more advanced then your version (and the version in Lucene contrib for that matter). It highlights only the terms that match the query. So if you search for the phrase "red truck" the terms "red" and "truck" will only be highlighted if they appear together. If you search for "red truck"~1 then the phrase "red fire truck" will be highlighted. It also uses a pretty clever algorithm to find the excerpts with the most matching information. It''s still quite experimental though so I need people to try it out and send in their suggestions. Cheers, Dave
Jean-Christophe Michel
2006-Sep-02 17:08 UTC
[Ferret-talk] how to get the words of a query
Hi, Le 1 sept. 06, ? 17:09, David Balmain a ?crit :>> Seems good, will be perfect if your truncate respects multi-byte >> chars. >> My ruby helper does it, see how it works on >> http://symetrie.com/fr/search >> (it highlights only the first occurence of each word currently). > > > Are you saying the highlight doesn''t respect multi-byte characters? If > so, could you give an example? The highlighter uses the byte > boundaries returned by the analyzer during indexing so I can''t see any > reason multi-byte characters wouldn''t be respected.No, it was a question, I was wondering wether it respected the multibyte. It''s a good news it can handle unicode.> Also, it''s quite a bit more advanced then your version (and the > version in Lucene contrib for that matter). It highlights only the > terms that match the query. So if you search for the phrase "red > truck" the terms "red" and "truck" will only be highlighted if they > appear together. If you search for "red truck"~1 then the phrase "red > fire truck" will be highlighted. It also uses a pretty clever > algorithm to find the excerpts with the most matching information. > It''s still quite experimental though so I need people to try it out > and send in their suggestions.Ok, I''ll try. Till now I was using my own ruby hilighter. Jean-Christophe Michel -- Sym?trie, ?dition de musique et services multim?dia 30 rue Jean-Baptiste Say 69001 LYON (FRANCE) t?l +33 (0)478 29 52 14 fax +33 (0)478 30 01 11 web www.symetrie.com