Does anyone know of a way of being ''accent-insensitive'' when i do a search? For example, if i have a resource with the name "La Boh?me", and someone searches for ''boheme'' i want them to find that resource, even though the ''e'' doesn''t have the accent. At the moment, it will only find it if they search for the properly accented version. I guess soundex support for ferret is what I mean, but maybe there''s another way? thanks, max -------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080421/2932e694/attachment.html
I just discovered the rather handy fuzzy searches, which i can do by adding (eg) "~0.6" to the end of my search term. So, this does the job (yay), but i''d still be interested in hearing if anyone else has solved this problem in a different way. :) On 21/04/2008, Max Williams <toastkid.williams at gmail.com> wrote:> > Does anyone know of a way of being ''accent-insensitive'' when i do a > search? > > For example, if i have a resource with the name "La Boh?me", and someone > searches for ''boheme'' i want them to find that resource, even though the ''e'' > doesn''t have the accent. At the moment, it will only find it if they search > for the properly accented version. > > I guess soundex support for ferret is what I mean, but maybe there''s > another way? > > thanks, max >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080421/69a842b7/attachment.html
Hi! You might create a custom Analyzer that does the job of replacing accentuated characters with their non-accentuated counterparts. If you apply this kind of analysis to both indexed content and queries, you''ll find "La Boh?me" with both ''boheme'' and ''boh?me'' as the query string. there''s a sample method that does the replacement part of the job up on the aaf wiki: http://projects.jkraemer.net/acts_as_ferret/#UTF-8support Have a look at the analyzer used in the omdb project for a more complete example: https://svn.omdb-beta.org/trunk/lib/omdb/ferret/omdb_analyzer.rb Cheers, Jens On Mon, Apr 21, 2008 at 05:49:43PM +0100, Max Williams wrote:> I just discovered the rather handy fuzzy searches, which i can do by adding > (eg) "~0.6" to the end of my search term. So, this does the job (yay), but > i''d still be interested in hearing if anyone else has solved this problem in > a different way. :) > > On 21/04/2008, Max Williams <toastkid.williams at gmail.com> wrote: > > > > Does anyone know of a way of being ''accent-insensitive'' when i do a > > search? > > > > For example, if i have a resource with the name "La Boh?me", and someone > > searches for ''boheme'' i want them to find that resource, even though the ''e'' > > doesn''t have the accent. At the moment, it will only find it if they search > > for the properly accented version. > > > > I guess soundex support for ferret is what I mean, but maybe there''s > > another way? > > > > thanks, max > >> _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk-- Jens Kr?mer Finkenlust 14, 06449 Aschersleben, Germany VAT Id DE251962952 http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
That''s very useful, thanks! I''m just using the fuzzy search for now, but if it proves too vague (too many false positive results) then i''ll look at this. I''d actually never seen that tr() method before, that combined with the ready-made accent substitutions in your link is itself very handy! cheers, max On 21/04/2008, Jens Kraemer <jk at jkraemer.net> wrote:> > Hi! > > You might create a custom Analyzer that does the job of replacing > accentuated characters with their non-accentuated counterparts. If you > apply this kind of analysis to both indexed content and queries, you''ll > find "La Boh?me" with both ''boheme'' and ''boh?me'' as the query string. > > there''s a sample method that does the replacement part of the job up on > the aaf wiki: http://projects.jkraemer.net/acts_as_ferret/#UTF-8support > > Have a look at the analyzer used in the omdb project for a more complete > example: > https://svn.omdb-beta.org/trunk/lib/omdb/ferret/omdb_analyzer.rb > > Cheers, > Jens > > > On Mon, Apr 21, 2008 at 05:49:43PM +0100, Max Williams wrote: > > I just discovered the rather handy fuzzy searches, which i can do by > adding > > (eg) "~0.6" to the end of my search term. So, this does the job (yay), > but > > i''d still be interested in hearing if anyone else has solved this > problem in > > a different way. :) > > > > On 21/04/2008, Max Williams <toastkid.williams at gmail.com> wrote: > > > > > > Does anyone know of a way of being ''accent-insensitive'' when i do a > > > search? > > > > > > For example, if i have a resource with the name "La Boh?me", and > someone > > > searches for ''boheme'' i want them to find that resource, even though > the ''e'' > > > doesn''t have the accent. At the moment, it will only find it if they > search > > > for the properly accented version. > > > > > > I guess soundex support for ferret is what I mean, but maybe there''s > > > another way? > > > > > > thanks, max > > > > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > -- > Jens Kr?mer > Finkenlust 14, 06449 Aschersleben, Germany > VAT Id DE251962952 > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://rubyforge.org/pipermail/ferret-talk/attachments/20080422/de07f300/attachment.html