Charlie
2006-Jul-05 14:22 UTC
[Rails] Is there any schema of full-text search that support utf-8?
Is there any schema of full-text search that support utf-8 especially for Asia language such as Chinese,Japanese,etc. Ferret/acts_as_ferret can not work when these language key words are searched,and also, it is difficult to implement pagination-which need both the count of search results and offset. Very grateful! -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Jul-06 03:31 UTC
[Rails] Is there any schema of full-text search that support utf-8?
On 7/5/06, Charlie <yingfeng.zhang@gmail.com> wrote:> Is there any schema of full-text search that support utf-8 especially > for Asia language such as Chinese,Japanese,etc. > Ferret/acts_as_ferret can not work when these language key words are > searched,and also, it is difficult to implement pagination-which need > both the count of search results and offset. > Very grateful!Hi Charlie, Ferret will work fine on Asian Languages. You just need to write your own Analyzer which matches tokens correctly for the language you are interested in. Have a look at the RegExpAnalyzer in Ferret. You can look at test/unit/analysis/ctc_analyzer.rb to see how it works. Cheers, Dave
Charlie
2006-Jul-07 02:58 UTC
[Rails] Re: Is there any schema of full-text search that support utf
Hi,David Can you give me an example of how to add analyzer to ferret to Asian languages? My web application will have to support multi language search,which means,for example,both Chinese and English will be searched through the form. Currently,I have decided to use the simple token principles,which means that every Chinese character will be a token,although this is not so well in some cases,my database column to be full-text searched include at most tens of UTF-8 characters,therefore i think it can works well. Thanks a lot! David Balmain wrote:> On 7/5/06, Charlie <yingfeng.zhang@gmail.com> wrote: >> Is there any schema of full-text search that support utf-8 especially >> for Asia language such as Chinese,Japanese,etc. >> Ferret/acts_as_ferret can not work when these language key words are >> searched,and also, it is difficult to implement pagination-which need >> both the count of search results and offset. >> Very grateful!-- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Jul-07 12:13 UTC
[Rails] Re: Is there any schema of full-text search that support utf
On 7/7/06, Charlie <yingfeng.zhang@gmail.com> wrote:> Hi,David > Can you give me an example of how to add analyzer to ferret to Asian > languages? > My web application will have to support multi language search,which > means,for example,both Chinese and English will be searched through the > form. > Currently,I have decided to use the simple token principles,which means > that every Chinese character will be a token,although this is not so > well in some cases,my database column to be full-text searched include > at most tens of UTF-8 characters,therefore i think it can works well. > Thanks a lot!# Create a PerFieldAnalyzer (AKA PerFieldAnalyzerWrapper) which defaults to Standard analyzer = PerFieldAnalyzer.new(StandardAnalyzer.new) # Add a special character analyzer for the chinese field or whatever field it is that has # chinese characters. This splits the data into single characters. analyzer["chinese"] = RegExpAnalyzer.new(/./, false) There you have it. Pretty simple. Cheers, Dave> David Balmain wrote: > > On 7/5/06, Charlie <yingfeng.zhang@gmail.com> wrote: > >> Is there any schema of full-text search that support utf-8 especially > >> for Asia language such as Chinese,Japanese,etc. > >> Ferret/acts_as_ferret can not work when these language key words are > >> searched,and also, it is difficult to implement pagination-which need > >> both the count of search results and offset. > >> Very grateful! > > -- > Posted via http://www.ruby-forum.com/. > _______________________________________________ > Rails mailing list > Rails@lists.rubyonrails.org > http://lists.rubyonrails.org/mailman/listinfo/rails >