thr3ads.net - Rails - International character search. [Oct 2006]

If this information is useful, please help other people find it:
Share via:

soulhunter

2006-Oct-12 02:19 UTC

International character search.

Hi!

I''m working on a program, and I need to do case-insensitive search with
international characters on it, like:

ñáéíóúàèìòùäëïöü and so on.

Anyway, I found a way of implementing it, but I don''t quite like it
because it would implies create the autocomplete function for *each*
autocomplete I have in my project.

The way of doing so I found is to change the condition from:

LOWER(column) like ''%thing_downcased%''

to

column ~* ''thing_downcased''

and replacing the international characters for the [ñÑ] kind of
expression, like this:

name ~* ''la [ñÑ]apa''

and it actually works (at least with postgresql), but then, I would
need to do the substitution everytime I do a search, and I would need
to reimplement the autocomplete function for each autocompletion with
the new schema.

Any better idea?,

Sincerely,

Ildefonso Camargo


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk
-~----------~----~----~----~------~----~------~--~---

Xavier Noria

2006-Oct-12 21:08 UTC

head link

Re: International character search.

On Oct 12, 2006, at 4:19 AM, soulhunter wrote:
> Anyway, I found a way of implementing it, but I don''t quite like
it
> because it would implies create the autocomplete function for *each*
> autocomplete I have in my project.
>
> The way of doing so I found is to change the condition from:
>
> LOWER(column) like ''%thing_downcased%''
Just to share a different approach, since you can''t expect users to  
type accented words correctly, I usually store a normalized extra  
column (say name_normalized) for searches maintained in some Rails- 
way like filters, or store just the normalization of them in ferret.  
Then any query has to be normalized.

-- fxn

   # Utility method that retursn an ASCIIfied, downcased, and  
sanitized string.
   # It relies on the Unicode Hacks plugin by means of String#chars.  
We assume
   # $KCODE is ''u'' in environment.rb. By now we support a wide
range
of latin
   # accented letters, based on the Unicode Character Palette bundled  
in Macs.
   def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[àáâãäåāąă]/u,   ''a'')
     n.gsub!(/\s+/,            '' '')
     n.gsub!(/æ/u,            ''ae'')
     n.gsub!(/[ďđ]/u,          ''d'')
     n.gsub!(/[çćčĉċ]/u,       ''c'')
     n.gsub!(/[èéêëēęěĕė]/u,   ''e'')
     n.gsub!(/ƒ/u,             ''f'')
     n.gsub!(/[ĝğġģ]/u,        ''g'')
     n.gsub!(/[ĥħ]/,           ''h'')
     n.gsub!(/[ììíîïīĩĭ]/u,    ''i'')
     n.gsub!(/[įıĳĵ]/u,        ''j'')
     n.gsub!(/[ķĸ]/u,          ''k'')
     n.gsub!(/[łľĺļŀ]/u,       ''l'')
     n.gsub!(/[ñńňņŉŋ]/u,      ''n'')
     n.gsub!(/[òóôõöøōőŏŏ]/u,  ''o'')
     n.gsub!(/œ/u,            ''oe'')
     n.gsub!(/[ŕřŗ]/u,         ''r'')
     n.gsub!(/[śšşŝș]/u,       ''s'')
     n.gsub!(/[ťţŧț]/u,        ''t'')
     n.gsub!(/[ùúûüūůűŭũų]/u,  ''u'')
     n.gsub!(/ŵ/u,             ''w'')
     n.gsub!(/[ýÿŷ]/u,         ''y'')
     n.gsub!(/[žżź]/u,         ''z'')
     n.gsub!(/[^\sa-z0-9_-]/,   '''')
     n
   end




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk
-~----------~----~----~----~------~----~------~--~---

David Balmain

2006-Oct-13 00:47 UTC

head link

Re: International character search.

On 10/13/06, Xavier Noria <fxn@hashref.com> wrote:>
> On Oct 12, 2006, at 4:19 AM, soulhunter wrote:
>
> > Anyway, I found a way of implementing it, but I don't quite like
it
> > because it would implies create the autocomplete function for *each*
> > autocomplete I have in my project.
> >
> > The way of doing so I found is to change the condition from:
> >
> > LOWER(column) like '%thing_downcased%'
>
> Just to share a different approach, since you can't expect users to
> type accented words correctly, I usually store a normalized extra
> column (say name_normalized) for searches maintained in some Rails-
> way like filters, or store just the normalization of them in ferret.
> Then any query has to be normalized.
>
> -- fxn
>
>    # Utility method that retursn an ASCIIfied, downcased, and
> sanitized string.
>    # It relies on the Unicode Hacks plugin by means of String#chars.
> We assume
>    # $KCODE is 'u' in environment.rb. By now we support a wide
range
> of latin
>    # accented letters, based on the Unicode Character Palette bundled
> in Macs.
>    def self.normalize(str)
>      n = str.chars.downcase.strip.to_s
>      n.gsub!(/[àáâãäåāąă]/u,   'a')
>      n.gsub!(/\s+/,            ' ')
>      n.gsub!(/æ/u,            'ae')
>      n.gsub!(/[ďđ]/u,          'd')
>      n.gsub!(/[çćčĉċ]/u,       'c')
>      n.gsub!(/[èéêëēęěĕė]/u,   'e')
>      n.gsub!(/ƒ/u,             'f')
>      n.gsub!(/[ĝğġģ]/u,        'g')
>      n.gsub!(/[ĥħ]/,           'h')
>      n.gsub!(/[ììíîïīĩĭ]/u,    'i')
>      n.gsub!(/[įıĳĵ]/u,        'j')
>      n.gsub!(/[ķĸ]/u,          'k')
>      n.gsub!(/[łľĺļŀ]/u,       'l')
>      n.gsub!(/[ñńňņŉŋ]/u,      'n')
>      n.gsub!(/[òóôõöøōőŏŏ]/u,  'o')
>      n.gsub!(/œ/u,            'oe')
>      n.gsub!(/[ŕřŗ]/u,         'r')
>      n.gsub!(/[śšşŝș]/u,       's')
>      n.gsub!(/[ťţŧț]/u,        't')
>      n.gsub!(/[ùúûüūůűŭũų]/u,  'u')
>      n.gsub!(/ŵ/u,             'w')
>      n.gsub!(/[ýÿŷ]/u,         'y')
>      n.gsub!(/[žżź]/u,         'z')
>      n.gsub!(/[^\sa-z0-9_-]/,   '')
>      n
>    end
>
Sweet! I've just been looking for a character conversion chart like
this to add a filter to Ferret. In a future version of Ferret (coming
very soon) this will be a lot easier and faster. I'll probably put an
option on the StandardAnalyzer called :normalize_unicode or something.

Thanks Xavier,
Dave

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk
-~----------~----~----~----~------~----~------~--~---

Xavier Noria

2006-Oct-13 01:57 UTC

head link

Re: International character search.

On Oct 13, 2006, at 2:47 AM, David Balmain wrote:
> Sweet! I''ve just been looking for a character conversion chart
like
> this to add a filter to Ferret. In a future version of Ferret (coming
> very soon) this will be a lot easier and faster. I''ll probably put
an
> option on the StandardAnalyzer called :normalize_unicode or something.
Excelent!

I noticed in the mail that a q-like character was among the a-like  
character class, I moved that out and send the normalizer again for  
the archives:

# Utility method that retursn an ASCIIfied, downcased, and sanitized  
string.
# It relies on the Unicode Hacks plugin by means of String#chars. We  
assume
# $KCODE is ''u'' in environment.rb. By now we support a wide
range of
latin
# accented letters, based on the Unicode Character Palette bundled in  
Macs.
def self.normalize(str)
   n = str.chars.downcase.strip.to_s
   n.gsub!(/[àáâãäåāă]/u,    ''a'')
   n.gsub!(/æ/u,            ''ae'')
   n.gsub!(/[ďđ]/u,          ''d'')
   n.gsub!(/[çćčĉċ]/u,       ''c'')
   n.gsub!(/[èéêëēęěĕė]/u,   ''e'')
   n.gsub!(/ƒ/u,             ''f'')
   n.gsub!(/[ĝğġģ]/u,        ''g'')
   n.gsub!(/[ĥħ]/,           ''h'')
   n.gsub!(/[ììíîïīĩĭ]/u,    ''i'')
   n.gsub!(/[įıĳĵ]/u,        ''j'')
   n.gsub!(/[ķĸ]/u,          ''k'')
   n.gsub!(/[łľĺļŀ]/u,       ''l'')
   n.gsub!(/[ñńňņŉŋ]/u,      ''n'')
   n.gsub!(/[òóôõöøōőŏŏ]/u,  ''o'')
   n.gsub!(/œ/u,            ''oe'')
   n.gsub!(/ą/u,             ''q'')
   n.gsub!(/[ŕřŗ]/u,         ''r'')
   n.gsub!(/[śšşŝș]/u,       ''s'')
   n.gsub!(/[ťţŧț]/u,        ''t'')
   n.gsub!(/[ùúûüūůűŭũų]/u,  ''u'')
   n.gsub!(/ŵ/u,             ''w'')
   n.gsub!(/[ýÿŷ]/u,         ''y'')
   n.gsub!(/[žżź]/u,         ''z'')
   n.gsub!(/\s+/,            '' '')
   n.gsub!(/[^\sa-z0-9_-]/,   '''')
   n
end

-- fxn




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Ruby on Rails: Talk" group.
To post to this group, send email to
rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org
To unsubscribe from this group, send email to
rubyonrails-talk-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rubyonrails-talk
-~----------~----~----~----~------~----~------~--~---

Rails - Oct 2006 - International character search.

International character search.

Re: International character search.

Re: International character search.

Re: International character search.