thr3ads.net - Ferret talk - [Ferret-talk] A few questions: Tweaking StemFilter, indexes, ... [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Carl Lerche

2007-Jan-21 17:09 UTC

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Hello all,

I am new to the list, but I have been using ferret for a little bit
already. I would first like to thank Dave for all his work on ferret.

I had a few questions that I haven''t been able to figure out after
messing around with ferret and going through the documentation.

StemFilter ------

I am trying to improve the quality of my searches in context of the
content of my application. I have created an analyzer using the
following:

StemFilter.new StopFilter.new(
LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words )

This has been pretty good so far, however, I really would like to get
a search for "plumber" match "plumbing" at maybe a lower
score than it
would match "plumbers". The thing is that plumber(s) is filtered to
"plumber" and plumbing is filtered to plumb, so it doesn''t
match. Is
there any way to tweak the filter to be able to do these matches? I
would like to match all noun and verbs together (and ideally with a
lower score than different verb conjugations would match). Another
example would be driving and driver.

Worst case scenario, I could probably do some preprocessing to the
search queries to expand "plumber" or "driving" to a query
that
includes both stems (for example expand the query for plumber to
"plumber plumb")

Indexes ---

I was wondering how exactly indexes are implemented under the hood and
if there is a way to give hints to ferret as to how our queries will
be formed in order to optimize performance. Maybe I''m thinking of
ferret too much as a database, but I am not too familiar with what''s
under ferret''s hood.

The reason I ask is that for the project I am working on, I have huge
amounts of text to search, but each item also has a location
associated with it (longitude & lattitude) and each query will only
want to search the text located in a specific area (point and radius).
I can add ranged parameters to the query and that will work, but is
that optimal? Hopefully I am making sense.

Donations ---

I was wondering if there is a page that lists the total amount of
donations so far?

Thanks,
-carl

-- 
EPA Rating: 3000 Lines of Code / Gallon (of coffee)

Ewout

2007-Jan-22 00:15 UTC

head link

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Hi,

You could use a FuzzyQuery, that will match words that have some degree
of resemblance, with lower score.
>StemFilter ------
>
>I am trying to improve the quality of my searches in context of the
>content of my application. I have created an analyzer using the
>following:
>
>StemFilter.new StopFilter.new(
>LowerCaseFilter.new(StandardTokenizer.new(text)), @stop_words )
>
>This has been pretty good so far, however, I really would like to get
>a search for "plumber" match "plumbing" at maybe a lower
score than it
>would match "plumbers". The thing is that plumber(s) is filtered
to
>"plumber" and plumbing is filtered to plumb, so it
doesn''t match. Is
>there any way to tweak the filter to be able to do these matches? I
>would like to match all noun and verbs together (and ideally with a
>lower score than different verb conjugations would match). Another
>example would be driving and driver.

William Morgan

2007-Jan-22 17:10 UTC

head link

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Excerpts from Carl Lerche''s message of Sun Jan 21 09:09:59 -0800
2007:> Worst case scenario, I could probably do some preprocessing to the
> search queries to expand "plumber" or "driving" to a
query that
> includes both stems (for example expand the query for plumber to
> "plumber plumb")
You can either do query expansion or you can modify the stemmer. Query
expansion is probably a little easier to experiment with because you
don''t have to worry about reindexing, but it does come with a
search-time cost which may or may not be negligible. (And it gets a
little tricky with phrasal queries.)
> I can add ranged parameters to the query and that will work, but is
> that optimal? Hopefully I am making sense.
I don''t know for sure whether Ferret is sophisticated enough to
optimize
retrieval based on multiple ranges, but it may very well be. In any
case, I think you''re doing the right thing.

-- 
William <wmorgan-ferret at masanjin.net>

Maybe Matching Threads

Search for more apparently analagous threads

Ferret talk - Jan 2007 - A few questions: Tweaking StemFilter, indexes, ...

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

[Ferret-talk] A few questions: Tweaking StemFilter, indexes, ...

Maybe Matching Threads