Hi! I''m working on a program, and I need to do case-insensitive search with international characters on it, like: ñáéíóúàèìòùäëïöü and so on. Anyway, I found a way of implementing it, but I don''t quite like it because it would implies create the autocomplete function for *each* autocomplete I have in my project. The way of doing so I found is to change the condition from: LOWER(column) like ''%thing_downcased%'' to column ~* ''thing_downcased'' and replacing the international characters for the [ñÑ] kind of expression, like this: name ~* ''la [ñÑ]apa'' and it actually works (at least with postgresql), but then, I would need to do the substitution everytime I do a search, and I would need to reimplement the autocomplete function for each autocompletion with the new schema. Any better idea?, Sincerely, Ildefonso Camargo --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
On Oct 12, 2006, at 4:19 AM, soulhunter wrote:> Anyway, I found a way of implementing it, but I don''t quite like it > because it would implies create the autocomplete function for *each* > autocomplete I have in my project. > > The way of doing so I found is to change the condition from: > > LOWER(column) like ''%thing_downcased%''Just to share a different approach, since you can''t expect users to type accented words correctly, I usually store a normalized extra column (say name_normalized) for searches maintained in some Rails- way like filters, or store just the normalization of them in ferret. Then any query has to be normalized. -- fxn # Utility method that retursn an ASCIIfied, downcased, and sanitized string. # It relies on the Unicode Hacks plugin by means of String#chars. We assume # $KCODE is ''u'' in environment.rb. By now we support a wide range of latin # accented letters, based on the Unicode Character Palette bundled in Macs. def self.normalize(str) n = str.chars.downcase.strip.to_s n.gsub!(/[àáâãäåāąă]/u, ''a'') n.gsub!(/\s+/, '' '') n.gsub!(/æ/u, ''ae'') n.gsub!(/[ďđ]/u, ''d'') n.gsub!(/[çćčĉċ]/u, ''c'') n.gsub!(/[èéêëēęěĕė]/u, ''e'') n.gsub!(/ƒ/u, ''f'') n.gsub!(/[ĝğġģ]/u, ''g'') n.gsub!(/[ĥħ]/, ''h'') n.gsub!(/[ììíîïīĩĭ]/u, ''i'') n.gsub!(/[įıijĵ]/u, ''j'') n.gsub!(/[ķĸ]/u, ''k'') n.gsub!(/[łľĺļŀ]/u, ''l'') n.gsub!(/[ñńňņʼnŋ]/u, ''n'') n.gsub!(/[òóôõöøōőŏŏ]/u, ''o'') n.gsub!(/œ/u, ''oe'') n.gsub!(/[ŕřŗ]/u, ''r'') n.gsub!(/[śšşŝș]/u, ''s'') n.gsub!(/[ťţŧț]/u, ''t'') n.gsub!(/[ùúûüūůűŭũų]/u, ''u'') n.gsub!(/ŵ/u, ''w'') n.gsub!(/[ýÿŷ]/u, ''y'') n.gsub!(/[žżź]/u, ''z'') n.gsub!(/[^\sa-z0-9_-]/, '''') n end --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
On 10/13/06, Xavier Noria <fxn@hashref.com> wrote:> > On Oct 12, 2006, at 4:19 AM, soulhunter wrote: > > > Anyway, I found a way of implementing it, but I don't quite like it > > because it would implies create the autocomplete function for *each* > > autocomplete I have in my project. > > > > The way of doing so I found is to change the condition from: > > > > LOWER(column) like '%thing_downcased%' > > Just to share a different approach, since you can't expect users to > type accented words correctly, I usually store a normalized extra > column (say name_normalized) for searches maintained in some Rails- > way like filters, or store just the normalization of them in ferret. > Then any query has to be normalized. > > -- fxn > > # Utility method that retursn an ASCIIfied, downcased, and > sanitized string. > # It relies on the Unicode Hacks plugin by means of String#chars. > We assume > # $KCODE is 'u' in environment.rb. By now we support a wide range > of latin > # accented letters, based on the Unicode Character Palette bundled > in Macs. > def self.normalize(str) > n = str.chars.downcase.strip.to_s > n.gsub!(/[àáâãäåāąă]/u, 'a') > n.gsub!(/\s+/, ' ') > n.gsub!(/æ/u, 'ae') > n.gsub!(/[ďđ]/u, 'd') > n.gsub!(/[çćčĉċ]/u, 'c') > n.gsub!(/[èéêëēęěĕė]/u, 'e') > n.gsub!(/ƒ/u, 'f') > n.gsub!(/[ĝğġģ]/u, 'g') > n.gsub!(/[ĥħ]/, 'h') > n.gsub!(/[ììíîïīĩĭ]/u, 'i') > n.gsub!(/[įıijĵ]/u, 'j') > n.gsub!(/[ķĸ]/u, 'k') > n.gsub!(/[łľĺļŀ]/u, 'l') > n.gsub!(/[ñńňņʼnŋ]/u, 'n') > n.gsub!(/[òóôõöøōőŏŏ]/u, 'o') > n.gsub!(/œ/u, 'oe') > n.gsub!(/[ŕřŗ]/u, 'r') > n.gsub!(/[śšşŝș]/u, 's') > n.gsub!(/[ťţŧț]/u, 't') > n.gsub!(/[ùúûüūůűŭũų]/u, 'u') > n.gsub!(/ŵ/u, 'w') > n.gsub!(/[ýÿŷ]/u, 'y') > n.gsub!(/[žżź]/u, 'z') > n.gsub!(/[^\sa-z0-9_-]/, '') > n > end >Sweet! I've just been looking for a character conversion chart like this to add a filter to Ferret. In a future version of Ferret (coming very soon) this will be a lot easier and faster. I'll probably put an option on the StandardAnalyzer called :normalize_unicode or something. Thanks Xavier, Dave --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk@googlegroups.com To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---
On Oct 13, 2006, at 2:47 AM, David Balmain wrote:> Sweet! I''ve just been looking for a character conversion chart like > this to add a filter to Ferret. In a future version of Ferret (coming > very soon) this will be a lot easier and faster. I''ll probably put an > option on the StandardAnalyzer called :normalize_unicode or something.Excelent! I noticed in the mail that a q-like character was among the a-like character class, I moved that out and send the normalizer again for the archives: # Utility method that retursn an ASCIIfied, downcased, and sanitized string. # It relies on the Unicode Hacks plugin by means of String#chars. We assume # $KCODE is ''u'' in environment.rb. By now we support a wide range of latin # accented letters, based on the Unicode Character Palette bundled in Macs. def self.normalize(str) n = str.chars.downcase.strip.to_s n.gsub!(/[àáâãäåāă]/u, ''a'') n.gsub!(/æ/u, ''ae'') n.gsub!(/[ďđ]/u, ''d'') n.gsub!(/[çćčĉċ]/u, ''c'') n.gsub!(/[èéêëēęěĕė]/u, ''e'') n.gsub!(/ƒ/u, ''f'') n.gsub!(/[ĝğġģ]/u, ''g'') n.gsub!(/[ĥħ]/, ''h'') n.gsub!(/[ììíîïīĩĭ]/u, ''i'') n.gsub!(/[įıijĵ]/u, ''j'') n.gsub!(/[ķĸ]/u, ''k'') n.gsub!(/[łľĺļŀ]/u, ''l'') n.gsub!(/[ñńňņʼnŋ]/u, ''n'') n.gsub!(/[òóôõöøōőŏŏ]/u, ''o'') n.gsub!(/œ/u, ''oe'') n.gsub!(/ą/u, ''q'') n.gsub!(/[ŕřŗ]/u, ''r'') n.gsub!(/[śšşŝș]/u, ''s'') n.gsub!(/[ťţŧț]/u, ''t'') n.gsub!(/[ùúûüūůűŭũų]/u, ''u'') n.gsub!(/ŵ/u, ''w'') n.gsub!(/[ýÿŷ]/u, ''y'') n.gsub!(/[žżź]/u, ''z'') n.gsub!(/\s+/, '' '') n.gsub!(/[^\sa-z0-9_-]/, '''') n end -- fxn --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group. To post to this group, send email to rubyonrails-talk-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org To unsubscribe from this group, send email to rubyonrails-talk-unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/rubyonrails-talk -~----------~----~----~----~------~----~------~--~---