Anybody who knows whether ferret or acts_as_ferret support wordsegment search? like what lucene can done. I wanna know,if not i will use lucene instead of this can''t found relevant documents on this aspect in ruby -- Posted via http://www.ruby-forum.com/.
On Mon, Apr 02, 2007 at 07:58:35AM +0200, Jin wrote:> Anybody who knows whether ferret or acts_as_ferret support wordsegment > search?I don''t know what you mean with this. Could you give an example? Strange enough, I only seem to find chinese documents when googling - looks like that''s a feature useful when analyzing chinese text...> like what lucene can done.If Lucene can do it, Ferret will most probably be able to do it, too :-) Maybe it''s just a matter of implementing a custom analyzer, I guess I found something like that there: http://kingcat1234.spaces.live.com/ (search for wordSegement). Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
Jens Kraemer wrote:> On Mon, Apr 02, 2007 at 07:58:35AM +0200, Jin wrote: >> Anybody who knows whether ferret or acts_as_ferret support wordsegment >> search? > > I don''t know what you mean with this. Could you give an example? Strange > enough, I only seem to find chinese documents when googling - looks like > that''s a feature useful when analyzing chinese text... >Yep.Now the search system is build up but maybe more features more better,now customer''s intension is if he input ''designpattern'',there is no space between design and pattern,it should dived the two words just like google and provide the information of ''designpattern'' and ''design pattern'',that is it> If Lucene can do it, Ferret will most probably be able to do it, too :-) > > Maybe it''s just a matter of implementing a custom analyzer, I guess I > found something like that there: http://kingcat1234.spaces.live.com/ > (search for wordSegement).I have checked the website u gave it,thank you.but what they done is for diving the chinese sentence to words and it is not so accurate,e.g. ''iamyourfriend'' may be dived to ''iam'',''amyou'',''yourfriend'',''friend'' something like this. this kind of things should need vocabulary support if similar implementation existed then i would prefer use it but not create it :) -- Posted via http://www.ruby-forum.com/.
On Tue, Apr 03, 2007 at 03:55:28AM +0200, Jin wrote:> Jens Kraemer wrote: > > On Mon, Apr 02, 2007 at 07:58:35AM +0200, Jin wrote: > >> Anybody who knows whether ferret or acts_as_ferret support wordsegment > >> search? > > > > I don''t know what you mean with this. Could you give an example? Strange > > enough, I only seem to find chinese documents when googling - looks like > > that''s a feature useful when analyzing chinese text... > > > Yep.Now the search system is build up but maybe more features more > better,now customer''s intension is if he input ''designpattern'',there is > no space between design and pattern,it should dived the two words just > like google and provide the information of ''designpattern'' and ''design > pattern'',that is itinteresting, that could be useful for analyzing german text, too - we have lots of composite words like this :-)> > If Lucene can do it, Ferret will most probably be able to do it, too :-) > > > > Maybe it''s just a matter of implementing a custom analyzer, I guess I > > found something like that there: http://kingcat1234.spaces.live.com/ > > (search for wordSegement). > I have checked the website u gave it,thank you.but what they done is for > diving the chinese sentence to words and it is not so accurate,e.g. > ''iamyourfriend'' may be dived to ''iam'',''amyou'',''yourfriend'',''friend'' > something like this.ah, ok.> this kind of things should need vocabulary supportyeah.> if similar implementation existed then i would prefer use it but not > create it :)At least I couldn''t find one. Are you sure lucene has an analyzer that can split composite words? If yes, porting it to ruby should be relatively easy :-) Jens -- Jens Kr?mer webit! Gesellschaft f?r neue Medien mbH Schnorrstra?e 76 | 01069 Dresden Telefon +49 351 46766-0 | Telefax +49 351 46766-66 kraemer at webit.de | www.webit.de Amtsgericht Dresden | HRB 15422 GF Sven Haubold, Hagen Malessa
On 4/3/07, Jens Kraemer <kraemer at webit.de> wrote:> On Tue, Apr 03, 2007 at 03:55:28AM +0200, Jin wrote: > > Jens Kraemer wrote: > > > On Mon, Apr 02, 2007 at 07:58:35AM +0200, Jin wrote: > > >> Anybody who knows whether ferret or acts_as_ferret support wordsegment > > >> search? > > > > > > I don''t know what you mean with this. Could you give an example? Strange > > > enough, I only seem to find chinese documents when googling - looks like > > > that''s a feature useful when analyzing chinese text... > > > > > Yep.Now the search system is build up but maybe more features more > > better,now customer''s intension is if he input ''designpattern'',there is > > no space between design and pattern,it should dived the two words just > > like google and provide the information of ''designpattern'' and ''design > > pattern'',that is it > > interesting, that could be useful for analyzing german text, too - we > have lots of composite words like this :-)I''ve already mentioned this to Jin in private, but I think the better solution for something like this is to post-process the query if you get very few (or zero) matches. For example you could run the query through a spell checker that would suggest you to rewrite ''designpattern'' as ''design pattern''. I''m not sure whether the spell checker approach would work in German or Chinese but some sort of post-processing should do the trick. If anyone has implemented something like this or has any good ideas I''d love to hear them. cheers, Dave -- Dave Balmain http://www.davebalmain.com/
On 4/6/07, David Balmain <dbalmain.ml at gmail.com> wrote:> On 4/3/07, Jens Kraemer <kraemer at webit.de> wrote: > > On Tue, Apr 03, 2007 at 03:55:28AM +0200, Jin wrote: > > > <snip> > > > Yep.Now the search system is build up but maybe more features more > > > better,now customer''s intension is if he input ''designpattern'',there is > > > no space between design and pattern,it should dived the two words just > > > like google and provide the information of ''designpattern'' and ''design > > > pattern'',that is it > > > > interesting, that could be useful for analyzing german text, too - we > > have lots of composite words like this :-) > > I''ve already mentioned this to Jin in private, but I think the better > solution for something like this is to post-process the query if you > get very few (or zero) matches. For example you could run the query > through a spell checker that would suggest you to rewrite > ''designpattern'' as ''design pattern''. I''m not sure whether the spell > checker approach would work in German or Chinese but some sort of > post-processing should do the trick. If anyone has implemented > something like this or has any good ideas I''d love to hear them.I forgot to mention; I would guess that this is probably how Google does the same thing. -- Dave Balmain http://www.davebalmain.com/