thr3ads.net - Ferret talk - [Ferret-talk] Search multiple models [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Frank Rosquin

2006-Apr-26 10:41 UTC

[Ferret-talk] Search multiple models

Hello,

Lets say you have a few models like Post, Article, Wiki, Comment, And 
you want to use ferret to search all of them at once. How would I set up 
the latest acts_as_ferret to accomplish this? And what would be fastest 
for searches? 1 index for all models, or have an index per model?

Thank you

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-May-03 16:54 UTC

head link

[Ferret-talk] Search multiple models

Hi,

and sorry for the late reply.

On Wed, Apr 26, 2006 at 12:41:10PM +0200, Frank Rosquin
wrote:> Hello,
> 
> Lets say you have a few models like Post, Article, Wiki, Comment, And 
> you want to use ferret to search all of them at once. How would I set up 
> the latest acts_as_ferret to accomplish this? And what would be fastest 
> for searches? 1 index for all models, or have an index per model?
Which would be fastest depends on the type of your queries. If most of
your queries search all models at once, a single index should be faster.

If you tend to query mainly a single model and queries across all models
are the exception, the index-per-model approach should be better suited.

However the difference won''t matter until you get to really big
indexes.

If you go the multiple index route (declaring acts_as_ferret in each of
the models you want to search), you can use the 

multi_search(query, additional_models = [], options = {}) 

method on any of these model classes, giving the list of all other model
classes to search through as the second parameter. the options hash is
the same as for find_by_contents. You have to add the 
:store_class_name => true option to your acts_as_ferret calls. That
turns class name storage in the indexes on and let''s multi_search know
what class to query for a given hit.

For the single index route, using Rails single table inheritance is the
easiest approach. Just call acts_as_ferret once in your base class, and
use find_by_contents as usual. This is known to work, I use this with
Typo''s Content base class.

If this is no option for you, you can configure each model class to use
the same index directory. This approach should work but hasn''t got much
(if any) testing so far. 
One problem here is that we use the id column as a key in ferret indexes, 
too. So the id has to be unique across the models you want to search. 
In addition, you would be on your own for querying the index, I don''t 
think any of the existing searching methods will work out of the box in
this scenario. The :store_class_name option to acts_as_ferret should 
be useful in this contextm, too.

Patches regarding these issues would be very welcome - my
hacking time is quite constrained atm...

After all, I''d either suggest the STI approach or, if that
doesn''t fit,
the multi-index route.

hope this helps,

Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Eric Gross

2006-Oct-22 22:41 UTC

head link

[Ferret-talk] Search multiple models

hey guys, any idea how to use those options with multi_search

I tried it on find_by_contents and it works fine, however, for 
multi_search i do:

@results = 
User.multi_search(parse(@query),[Book],{:offset=>0,:limit=>5})

or

@results =  User.multi_search(parse(@query),[Book],:offset=>0,:limit=>5)

and neither works, however I get no error either. Whats wrong?

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-Oct-23 09:22 UTC

head link

[Ferret-talk] Search multiple models

On Mon, Oct 23, 2006 at 12:41:08AM +0200, Eric Gross
wrote:> hey guys, any idea how to use those options with multi_search
> 
> I tried it on find_by_contents and it works fine, however, for 
> multi_search i do:
> 
> @results = 
> User.multi_search(parse(@query),[Book],{:offset=>0,:limit=>5})
> 
> or
> 
> @results = 
User.multi_search(parse(@query),[Book],:offset=>0,:limit=>5)
> 
> and neither works, however I get no error either. Whats wrong?
that''s not implemented yet, but there''s a patch in trac I plan
to
integrate into the next release of aaf.

http://projects.jkraemer.net/acts_as_ferret/ticket/60


Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Martin Bernd Schmeil

2006-Dec-27 17:05 UTC

head link

[Ferret-talk] Search multiple models

Hi all,

just started to play with (acts_as_)ferret a couple of hours ago, when I 
learned that ferret supports fuzzy search.

I could not find an answer to the problem i need to solve yet:

I have a few models with one to many relations to Clients: Addresses, 
Contacts, Phone numbers, etc.

i.e. a client may have many addresses and so on.

I need to match a "flat" (each attribute only once) client record 
against all the models attributes mentioned above and get a list of 
clients with descending probability of being a duplicate.

Is this possible? Which options should I use to save memory and 
performance?

Thanks in advance! - Bernd

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2007-Jan-10 09:00 UTC

head link

[Ferret-talk] Search multiple models

On Wed, Dec 27, 2006 at 06:05:46PM +0100, Martin Bernd Schmeil
wrote:> Hi all,
> 
> just started to play with (acts_as_)ferret a couple of hours ago, when I 
> learned that ferret supports fuzzy search.
> 
> I could not find an answer to the problem i need to solve yet:
> 
> I have a few models with one to many relations to Clients: Addresses, 
> Contacts, Phone numbers, etc.
> 
> i.e. a client may have many addresses and so on.
> 
> I need to match a "flat" (each attribute only once) client record
> against all the models attributes mentioned above and get a list of 
> clients with descending probability of being a duplicate.
> 
> Is this possible? 
As a first try I''d build a single Ferret document for each client,
containing all his contacts, addresses and phone numbers. For better
results you could keep all addresses in one field, phone numbers in
another, and contact names in a third field.

Then take each record you suspect being a duplicate and build a query 
from it, using the same way of distributing the data to different fields.
Running that query against the index should give you a list of possible 
duplicate records sorted by relevance.
> Which options should I use to save memory and 
> performance?
There seems to be no need to store the field contents themselves in the
index, so this should be turned off with :store => :no when the index is
created. Otherwise I''d first make it work and then look if further
optimization is needed at all - Ferret is *really* fast.

cheers,
Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Martin Bernd Schmeil

2007-Jan-16 10:49 UTC

head link

[Ferret-talk] Search multiple models

Hi Jens,

thanks for the answer. (Because of time constraints) I solved the 
problem in a different way, i.e. providing each model a client_id method 
and then summing up the individual fuzzy search results for each 
attribute.

I guess this is neither legant nor performant and I''m not happy with
the
resulting scores. But we can live with it for now.

The main issues we have is the well known locking problem and the 
scores.

The scores leave us with the problem that - while the order seems to be 
correct - we don''t know where to cut the line to display results and 
what a relevant match is. For a dozen attributes I''ve seen scores from 
0.something to 9.something, with  a result close below 9 not even 
looling similar while just above 9 seems to be a "99 percent" match.

If someone would tell me - in case this is possible at all - how to 
normalize the scores I''d be very happy.

Another thing which I didn''t understand yet is what actually happens if
I do a multi token fuzzy search; currently I''m splitting the string up 
in multiple tokens and build one query "attribute:token1~ AND 
attribute:token2~ AND ...". Maybe not really what I should do to get 
correct scores.

Anyways, thanks for your work and for answering my post.


-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2007-Jan-16 11:38 UTC

head link

[Ferret-talk] Search multiple models

On Tue, Jan 16, 2007 at 11:49:12AM +0100, Martin Bernd Schmeil
wrote:> Hi Jens,
> 
[..]> The main issues we have is the well known locking problem and the 
> scores.
Making sure you only have one process writing to the index, i.e. via
an indexer running in backgroundrb, should solve these issues.
> The scores leave us with the problem that - while the order seems to be 
> correct - we don''t know where to cut the line to display results
and
> what a relevant match is. For a dozen attributes I''ve seen scores
from
> 0.something to 9.something, with  a result close below 9 not even 
> looling similar while just above 9 seems to be a "99 percent"
match.
the calculation of scores is quite complex. To get an idea what happens
in there you can use Ferret''s explain method (in
Ferret::Search::Searcher).
> If someone would tell me - in case this is possible at all - how to 
> normalize the scores I''d be very happy.
no idea if this is possible - maybe you find some information about
this in the context of Lucene (i.e. in Eric Hatcher''s fine Lucene book
or on the lucene mailing list).
> Another thing which I didn''t understand yet is what actually
happens if
> I do a multi token fuzzy search; currently I''m splitting the
string up
> in multiple tokens and build one query "attribute:token1~ AND 
> attribute:token2~ AND ...". Maybe not really what I should do to get 
> correct scores.
don''t know if there is another way to express this with ferret. 


cheers,
Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

Martin Bernd Schmeil

2007-Jan-16 11:50 UTC

head link

[Ferret-talk] Search multiple models

Again thanks for the answers.

I did read the score formula, but my maths knowlege is almost gone now. 
I''ll look at the docs again if I have more spare time. With the most 
relavant word I should be able to scale the scores to percentage.

We have multiple servers running, so we definitly have concurrency 
problems. I just didn''t play with stuff like backgroundrb yet, so I
need
to investigate on how to implemment a single writer solution. But thanks 
for the hint.

-- 
Posted via http://www.ruby-forum.com/.

Apparently Analagous Threads

Search for more possibly parallel threads

Ferret talk - Apr 2006 - Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

[Ferret-talk] Search multiple models

Apparently Analagous Threads