Hi!
On Tue, Aug 22, 2006 at 10:35:52AM +0200, Jean-Christophe Michel
wrote:> Hi,
>
> Using ferret and acts_as_ferret.
> Great work.
thanks :-)
> Is there a way to define some synonyms (searchable words that would not
> appear in the texts ?
> Like stop words, but instead of being removed from query and index,
> they would be added ;-)
This can be done with a custom analyzer. The Lucene in Action book has a
good chapter on the whola analysis topic, which does cover synonyms,
too. You really should get this if you intend to do serious work with
Ferret and/or Lucene, it was really helpful to me.
http://www.manning.com/hatcher2/excerpt_contents.html
Basically you can add synonyms to your index at indexing time (afair
by having multiple terms sharing the same TermPosition), or you
can expand your user''s queries using synonyms (e.g. the term
''lift''
could be expaned to the boolean clause ''lift OR elevator'').
There''s some
code in lucene contrib that takes the wordnet synonym database and
builds an index from it, that in turn can be used for the query
expansion task.
> Can some synonyms be regexp ? I''d like for instance to have ?
(oelig)
> be equivalent to oe in French.
> Or maybe an utf8 normalization could achieve this last point easier ?
I would put this into a custom analyzer that then gets used for indexing
and query parsing. Just replace the ? to oe in both queries and indexed
text. But remember that you lose some information that way, as now any
query having ''oe'' in it will also match terms that once had ?
in this
place.
Jens
--
webit! Gesellschaft f?r neue Medien mbH www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de
Schnorrstra?e 76 Tel +49 351 46766 0
D-01069 Dresden Fax +49 351 46766 66