The documentation* states that when using a single index for multiple models, the default_field list should be set to the same thing for all models. However, in my application, all my models have very different fields and this is not possible. I still want the results returned sorted by term frequency across all indexed content in each model. What is the purpose of default_field? Under what multi-model circumstance, if any, is it not necessary to use it? Thanks, John *http://projects.jkraemer.net/rdoc/acts_as_ferret/classes/ ActsAsFerret/ActMethods.html#M000009
Hi! On Wed, Jan 02, 2008 at 02:30:23PM -0500, John Bachir wrote:> The documentation* states that when using a single index for multiple > models, the default_field list should be set to the same thing for > all models. > > However, in my application, all my models have very different fields > and this is not possible. I still want the results returned sorted by > term frequency across all indexed content in each model.Short answer: It''s safe for you to specify the same large :default_field list containing fields from all models in all your acts_as_ferret calls. aaf doesn''t use this list but only hands it through to Ferret''s query parser which uses it to expand queries that have no fields specified.> What is the purpose of default_field? Under what multi-model > circumstance, if any, is it not necessary to use it?Long answer: The default_field option determines which fields Ferret will search for when there is no explicit field specified in a query. Suppose your index has the fields :id and :text (with id being untokenized). With an empty default_field value (or ''*'', which means the same), and a :or_default value of false (as aaf sets it) you get parsed queries like this: ''tree'' --> ''id:tree text:tree'' ''some tree'' (meaning some AND tree because or_default == false) --> ''+(id:some) +(id:tree text:tree)'' With ''some'' being a stop word, one would expect the second query to yield the same result as the first one, but since the query is run against all fields, including :id, which is untokenized and therefore has no analyzer, we end up querying our id field with a required term query and get no result at all. I remember there has been some debate about this topic a year ago or so, and in theory it would be possible for Ferret to parse queries the other way around to work around this issue, but afair Dave brought up some good reasons to leave it as it is. The solution is to tell Ferret which fields to search when no fields are specified for a query (or part of a query) with the :default_field option. Usually aaf does this automatically by collecting all tokenized fields from the model. Now with a shared index there are n models but one index, so here we need to have a joint list of all tokenized fields across all these models for the :default_field parameter. Since aaf is called in every single model, I didn''t find an easy way to build this list automatically and decided to leave it up to the user to specify this list in the acts_as_ferret calls of every model. Not really DRY indeed. Patches welcome ;-) Here''s a small script reproducing the issue: http://pastie.caboo.se/134443 So to summarize: You need to specify :default_field if you''re using :single_index => true in combination with :or_default => false (aaf default) and you have queries that may contain stop words and that are not constrained to a list of fields specified in the query string. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote:> You need to specify :default_field if you''re using :single_index => > true > in combination with :or_default => false (aaf default) and you have > queries that may contain stop words and that are not constrained to a > list of fields specified in the query string.Thank you Jens for your elaborate response. Our code removes stop words from all queries before sending them to AAF. In this case, would the lack of setting default_field ever be a problem? Perhaps this is why we have not seen problems even though we have never set default_field. Cheers, John
On Thu, Jan 03, 2008 at 11:06:24AM -0500, John Bachir wrote:> > On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote: > > > You need to specify :default_field if you''re using :single_index => > > true > > in combination with :or_default => false (aaf default) and you have > > queries that may contain stop words and that are not constrained to a > > list of fields specified in the query string. > > Thank you Jens for your elaborate response. > > Our code removes stop words from all queries before sending them to > AAF. In this case, would the lack of setting default_field ever be a > problem? Perhaps this is why we have not seen problems even though we > have never set default_field.exactly, in this case you shouldn''t have any problems. Cheers, Jens -- Jens Kr?mer http://www.jkraemer.net/ - Blog http://www.omdb.org/ - The new free film database
i added your comments to the wiki: http://projects.jkraemer.net/acts_as_ferret/wiki/AdvancedUsage? action=diff&version=11 On Jan 3, 2008, at 10:38 AM, Jens Kraemer wrote:> Hi! > > On Wed, Jan 02, 2008 at 02:30:23PM -0500, John Bachir wrote: >> The documentation* states that when using a single index for multiple >> models, the default_field list should be set to the same thing for >> all models. >> >> However, in my application, all my models have very different fields >> and this is not possible. I still want the results returned sorted by >> term frequency across all indexed content in each model. > > Short answer: > > It''s safe for you to specify the same large :default_field list > containing > fields from all models in all your acts_as_ferret calls. aaf > doesn''t use > this list but only hands it through to Ferret''s query parser which > uses > it to expand queries that have no fields specified. > >> What is the purpose of default_field? Under what multi-model >> circumstance, if any, is it not necessary to use it? > > Long answer: > > The default_field option determines which fields Ferret will search > for > when there is no explicit field specified in a query. > > Suppose your index has the fields :id and :text (with id being > untokenized). With an empty default_field value (or ''*'', which > means the > same), and a :or_default value of false (as aaf sets it) you get > parsed > queries like this: > > ''tree'' > --> ''id:tree text:tree'' > > ''some tree'' (meaning some AND tree because or_default == false) > --> ''+(id:some) +(id:tree text:tree)'' > > With ''some'' being a stop word, one would expect the second query to > yield the same result as the first one, but since the query is run > against all fields, including :id, which is untokenized and therefore > has no analyzer, we end up querying our id field with a required term > query and get no result at all. > > I remember there has been some debate about this topic a year ago > or so, > and in theory it would be possible for Ferret to parse queries the > other way > around to work around this issue, but afair Dave brought up some good > reasons to leave it as it is. > > The solution is to tell Ferret which fields to search when no > fields are > specified for a query (or part of a query) with the :default_field > option. Usually aaf does this automatically by collecting all > tokenized > fields from the model. Now with a shared index there are n models but > one index, so here we need to have a joint list of all tokenized > fields > across all these models for the :default_field parameter. > > Since aaf is called in every single model, I didn''t find an easy > way to > build this list automatically and decided to leave it up to the > user to > specify this list in the acts_as_ferret calls of every model. Not > really > DRY indeed. Patches welcome ;-) > > Here''s a small script reproducing the issue: > http://pastie.caboo.se/134443 > > So to summarize: > > You need to specify :default_field if you''re using :single_index => > true > in combination with :or_default => false (aaf default) and you have > queries that may contain stop words and that are not constrained to a > list of fields specified in the query string. > > > Cheers, > Jens > > > > > -- > Jens Kr?mer > http://www.jkraemer.net/ - Blog > http://www.omdb.org/ - The new free film database > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk