I have design question and I''m wondering what''s the best way to solve it. I''m trying to index HTML content where I have a single model object call it Article that is an acts_as_ferret model, and an article consists of many HTML files. I would like to index all of the content of the article with ferret and search across it. However, since the article''s content is spread over several files how would I do that if I don''t have an object in the database for each page? Is there a way from within my Article object to add more than one Document to the index? These pages would obviously be attached to the life cycle of the Article. In other words if I remove the article I want to remove all the pages that went along with that article. How would I do that? Another question I have is I would like to search the elements of the article like author, title, etc, and search the contents of those Articles within one search field. Can I place all of this data inside a single index? Or do I have to use the multi_search method? Thanks Charlie -- Posted via http://www.ruby-forum.com/.
On Mon, Oct 02, 2006 at 03:30:59PM +0200, Charlie Hubbard wrote:> > I have design question and I''m wondering what''s the best way to solve > it. I''m trying to index HTML content where I have a single model object > call it Article that is an acts_as_ferret model, and an article consists > of many HTML files. I would like to index all of the content of the > article with ferret and search across it. However, since the article''s > content is spread over several files how would I do that if I don''t have > an object in the database for each page? Is there a way from within my > Article object to add more than one Document to the index? These pages > would obviously be attached to the life cycle of the Article. In other > words if I remove the article I want to remove all the pages that went > along with that article. How would I do that?Do you want to be able to find single html files in search results, or is it ok to only find the whole article, without knowing which file the hit was in ? In the first case, you can either create a Page model representing a single page and index that, or don''t use acts_as_ferret at all and do the indexing yourself. The easier way is the second case, just create a method named html_content returning the concatenated contents from all the files belonging to your article, and add :html_content to the fields list in your call to acts_as_ferret. This will index all files belonging to your article in a single Ferret document.> Another question I have is I would like to search the elements of the > article like author, title, etc, and search the contents of those > Articles within one search field. Can I place all of this data inside a > single index? Or do I have to use the multi_search method?you''ll only need multi_search if you have several indexes (that is, several Model classes where you called acts_as_ferret). In your case, if you choose the second way, just index your meta data together with the content, aaf will by default search in all fields. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jens Kraemer wrote:> On Mon, Oct 02, 2006 at 03:30:59PM +0200, Charlie Hubbard wrote: >> words if I remove the article I want to remove all the pages that went >> along with that article. How would I do that? > > Do you want to be able to find single html files in search results, or > is it ok to only find the whole article, without knowing which file the > hit was in ? > > In the first case, you can either create a Page model representing a > single page and index that, or don''t use acts_as_ferret at all and do > the indexing yourself.This is actually more the scenario. I want the user to be able to jump right to the relevant portions of article and see their search results. Possibly with highlights etc. Mainly because these articles can be quite large.>> Another question I have is I would like to search the elements of the >> article like author, title, etc, and search the contents of those >> Articles within one search field. Can I place all of this data inside a >> single index? Or do I have to use the multi_search method? > > you''ll only need multi_search if you have several indexes (that is, > several Model classes where you called acts_as_ferret). > In your case, if you choose the second way, just index your meta data > together with the content, aaf will by default search in all fields.So bottom line is create a Page object for each page of the article and put that stuff in the DB, and use the acts_as_ferret options to find it. Use the multi-search across the two models. Thanks Charlie -- Posted via http://www.ruby-forum.com/.
On Mon, Oct 02, 2006 at 08:47:04PM +0200, Charlie Hubbard wrote:> Jens Kraemer wrote:[..]> > you''ll only need multi_search if you have several indexes (that is, > > several Model classes where you called acts_as_ferret). > > In your case, if you choose the second way, just index your meta data > > together with the content, aaf will by default search in all fields. > > So bottom line is create a Page object for each page of the article and > put that stuff in the DB, and use the acts_as_ferret options to find it. > Use the multi-search across the two models.right. to further simplify things, you could index the article''s meta data with each page, via an indexed method you mention in your field list. that method should retrieve the meta data from the parent article object and get this indexed together with each page. this might actually be faster than using multi_search (unless your article meta data is really large so that the overhead of indexing it with each page weighs in). In addition it would save you from having to handle different kinds of objects (Articles and Pages) in your result set. cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66