Hello, Lets say you have a few models like Post, Article, Wiki, Comment, And you want to use ferret to search all of them at once. How would I set up the latest acts_as_ferret to accomplish this? And what would be fastest for searches? 1 index for all models, or have an index per model? Thank you -- Posted via http://www.ruby-forum.com/.
Hi, and sorry for the late reply. On Wed, Apr 26, 2006 at 12:41:10PM +0200, Frank Rosquin wrote:> Hello, > > Lets say you have a few models like Post, Article, Wiki, Comment, And > you want to use ferret to search all of them at once. How would I set up > the latest acts_as_ferret to accomplish this? And what would be fastest > for searches? 1 index for all models, or have an index per model?Which would be fastest depends on the type of your queries. If most of your queries search all models at once, a single index should be faster. If you tend to query mainly a single model and queries across all models are the exception, the index-per-model approach should be better suited. However the difference won''t matter until you get to really big indexes. If you go the multiple index route (declaring acts_as_ferret in each of the models you want to search), you can use the multi_search(query, additional_models = [], options = {}) method on any of these model classes, giving the list of all other model classes to search through as the second parameter. the options hash is the same as for find_by_contents. You have to add the :store_class_name => true option to your acts_as_ferret calls. That turns class name storage in the indexes on and let''s multi_search know what class to query for a given hit. For the single index route, using Rails single table inheritance is the easiest approach. Just call acts_as_ferret once in your base class, and use find_by_contents as usual. This is known to work, I use this with Typo''s Content base class. If this is no option for you, you can configure each model class to use the same index directory. This approach should work but hasn''t got much (if any) testing so far. One problem here is that we use the id column as a key in ferret indexes, too. So the id has to be unique across the models you want to search. In addition, you would be on your own for querying the index, I don''t think any of the existing searching methods will work out of the box in this scenario. The :store_class_name option to acts_as_ferret should be useful in this contextm, too. Patches regarding these issues would be very welcome - my hacking time is quite constrained atm... After all, I''d either suggest the STI approach or, if that doesn''t fit, the multi-index route. hope this helps, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
hey guys, any idea how to use those options with multi_search I tried it on find_by_contents and it works fine, however, for multi_search i do: @results = User.multi_search(parse(@query),[Book],{:offset=>0,:limit=>5}) or @results = User.multi_search(parse(@query),[Book],:offset=>0,:limit=>5) and neither works, however I get no error either. Whats wrong? -- Posted via http://www.ruby-forum.com/.
On Mon, Oct 23, 2006 at 12:41:08AM +0200, Eric Gross wrote:> hey guys, any idea how to use those options with multi_search > > I tried it on find_by_contents and it works fine, however, for > multi_search i do: > > @results = > User.multi_search(parse(@query),[Book],{:offset=>0,:limit=>5}) > > or > > @results = User.multi_search(parse(@query),[Book],:offset=>0,:limit=>5) > > and neither works, however I get no error either. Whats wrong?that''s not implemented yet, but there''s a patch in trac I plan to integrate into the next release of aaf. http://projects.jkraemer.net/acts_as_ferret/ticket/60 Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hi all, just started to play with (acts_as_)ferret a couple of hours ago, when I learned that ferret supports fuzzy search. I could not find an answer to the problem i need to solve yet: I have a few models with one to many relations to Clients: Addresses, Contacts, Phone numbers, etc. i.e. a client may have many addresses and so on. I need to match a "flat" (each attribute only once) client record against all the models attributes mentioned above and get a list of clients with descending probability of being a duplicate. Is this possible? Which options should I use to save memory and performance? Thanks in advance! - Bernd -- Posted via http://www.ruby-forum.com/.
On Wed, Dec 27, 2006 at 06:05:46PM +0100, Martin Bernd Schmeil wrote:> Hi all, > > just started to play with (acts_as_)ferret a couple of hours ago, when I > learned that ferret supports fuzzy search. > > I could not find an answer to the problem i need to solve yet: > > I have a few models with one to many relations to Clients: Addresses, > Contacts, Phone numbers, etc. > > i.e. a client may have many addresses and so on. > > I need to match a "flat" (each attribute only once) client record > against all the models attributes mentioned above and get a list of > clients with descending probability of being a duplicate. > > Is this possible?As a first try I''d build a single Ferret document for each client, containing all his contacts, addresses and phone numbers. For better results you could keep all addresses in one field, phone numbers in another, and contact names in a third field. Then take each record you suspect being a duplicate and build a query from it, using the same way of distributing the data to different fields. Running that query against the index should give you a list of possible duplicate records sorted by relevance.> Which options should I use to save memory and > performance?There seems to be no need to store the field contents themselves in the index, so this should be turned off with :store => :no when the index is created. Otherwise I''d first make it work and then look if further optimization is needed at all - Ferret is *really* fast. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hi Jens, thanks for the answer. (Because of time constraints) I solved the problem in a different way, i.e. providing each model a client_id method and then summing up the individual fuzzy search results for each attribute. I guess this is neither legant nor performant and I''m not happy with the resulting scores. But we can live with it for now. The main issues we have is the well known locking problem and the scores. The scores leave us with the problem that - while the order seems to be correct - we don''t know where to cut the line to display results and what a relevant match is. For a dozen attributes I''ve seen scores from 0.something to 9.something, with a result close below 9 not even looling similar while just above 9 seems to be a "99 percent" match. If someone would tell me - in case this is possible at all - how to normalize the scores I''d be very happy. Another thing which I didn''t understand yet is what actually happens if I do a multi token fuzzy search; currently I''m splitting the string up in multiple tokens and build one query "attribute:token1~ AND attribute:token2~ AND ...". Maybe not really what I should do to get correct scores. Anyways, thanks for your work and for answering my post. -- Posted via http://www.ruby-forum.com/.
On Tue, Jan 16, 2007 at 11:49:12AM +0100, Martin Bernd Schmeil wrote:> Hi Jens, >[..]> The main issues we have is the well known locking problem and the > scores.Making sure you only have one process writing to the index, i.e. via an indexer running in backgroundrb, should solve these issues.> The scores leave us with the problem that - while the order seems to be > correct - we don''t know where to cut the line to display results and > what a relevant match is. For a dozen attributes I''ve seen scores from > 0.something to 9.something, with a result close below 9 not even > looling similar while just above 9 seems to be a "99 percent" match.the calculation of scores is quite complex. To get an idea what happens in there you can use Ferret''s explain method (in Ferret::Search::Searcher).> If someone would tell me - in case this is possible at all - how to > normalize the scores I''d be very happy.no idea if this is possible - maybe you find some information about this in the context of Lucene (i.e. in Eric Hatcher''s fine Lucene book or on the lucene mailing list).> Another thing which I didn''t understand yet is what actually happens if > I do a multi token fuzzy search; currently I''m splitting the string up > in multiple tokens and build one query "attribute:token1~ AND > attribute:token2~ AND ...". Maybe not really what I should do to get > correct scores.don''t know if there is another way to express this with ferret. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Again thanks for the answers. I did read the score formula, but my maths knowlege is almost gone now. I''ll look at the docs again if I have more spare time. With the most relavant word I should be able to scale the scores to percentage. We have multiple servers running, so we definitly have concurrency problems. I just didn''t play with stuff like backgroundrb yet, so I need to investigate on how to implemment a single writer solution. But thanks for the hint. -- Posted via http://www.ruby-forum.com/.