I have a general question about using a Ferret/Lucene index for grouping results. I am not sure how much of the heavy lifting the index can do for me, so I would appreciate any input. I am using ferret to index some objects that have the following properties: url, image_url, price, tags (space separated tags), created_at I would like search the index for any documents that match a specific tag. The way these results will be processed is as follows: Each URL must be unique in the results. If there are duplicates, I would like to merge the results using some fuzzy merge criteria. Ideally, this merge would take the most common occurrence of each of the properties and apply them to the final single result. My current thoughts on how to implement this is to search the index using a standard search and sorting by the URL. Then I will just manually apply the merge logic to each set of URLs. Does this sound reasonable? Thanks, Tom
On 1/27/06, Tom Davies <atomgiant at gmail.com> wrote:> I have a general question about using a Ferret/Lucene index for > grouping results. I am not sure how much of the heavy lifting the > index can do for me, so I would appreciate any input. I am using > ferret to index some objects that have the following properties: > > url, image_url, price, tags (space separated tags), created_at > > I would like search the index for any documents that match a specific > tag. The way these results will be processed is as follows: > > Each URL must be unique in the results. If there are duplicates, I > would like to merge the results using some fuzzy merge criteria. > Ideally, this merge would take the most common occurrence of each of > the properties and apply them to the final single result. > > My current thoughts on how to implement this is to search the index > using a standard search and sorting by the URL. Then I will just > manually apply the merge logic to each set of URLs. > > Does this sound reasonable?Hi Tom, That sounds like the way I''d probably do it. I don''t know if this will help but did you know that documents can contain multiple fields with the same name? So effectively you could store a unique document for each URL and store an array of image_urls, prices and tags in that document. Hope that helps, Dave
Thanks Dave. Actually I did not know that. That may be a useful feature. The only problem I forsee is how to remove a reference to each of those properties from the array when a document is deleted. I will give it some more thought, but it is nice to have options. Thanks again, Tom On 1/27/06, David Balmain <dbalmain.ml at gmail.com> wrote:> On 1/27/06, Tom Davies <atomgiant at gmail.com> wrote: > > I have a general question about using a Ferret/Lucene index for > > grouping results. I am not sure how much of the heavy lifting the > > index can do for me, so I would appreciate any input. I am using > > ferret to index some objects that have the following properties: > > > > url, image_url, price, tags (space separated tags), created_at > > > > I would like search the index for any documents that match a specific > > tag. The way these results will be processed is as follows: > > > > Each URL must be unique in the results. If there are duplicates, I > > would like to merge the results using some fuzzy merge criteria. > > Ideally, this merge would take the most common occurrence of each of > > the properties and apply them to the final single result. > > > > My current thoughts on how to implement this is to search the index > > using a standard search and sorting by the URL. Then I will just > > manually apply the merge logic to each set of URLs. > > > > Does this sound reasonable? > > Hi Tom, > > That sounds like the way I''d probably do it. I don''t know if this will > help but did you know that documents can contain multiple fields with > the same name? So effectively you could store a unique document for > each URL and store an array of image_urls, prices and tags in that > document. > > Hope that helps, > Dave > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >