Hello everyone, I am writing an application which collects a set of web sites and caches them locally for offline viewing. I want to do searches on this collection and associate extra data with each result (e.g date collected, reason for collection, perhaps a sequence number). Now all this data exists when the harvesting is done and could be stored in a database. I want to use RDig to index my collection of sites I also want to associate the index results with my extra data and display them along with search results. The index is built once and searched many times so I want searching to be as efficient as possible. The simplest way is to use e.g. the local URL as a key into my database (easy but needs to be done each time and could slow things down) Is it possible to add extra fields to ferret index entries? If so, can this be done at create time or must it be done afterwards? If it can be done at create time is there a way to get RDig to insert these extra fields? Thanks for any help with this Ed -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Feb-10 17:33 UTC
[Ferret-talk] Adding extra fields to an index (using RDig?)
Hi! On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote:> Hello everyone, > > I am writing an application which collects a set of web sites and caches > them locally for offline viewing. I want to do searches on this > collection and associate extra data with each result (e.g date > collected, reason for collection, perhaps a sequence number). > > Now all this data exists when the harvesting is done and could be stored > in a database. I want to use RDig to index my collection of sites I also > want to associate the index results with my extra data and display them > along with search results. > > The index is built once and searched many times so I want searching to > be as efficient as possible. > > The simplest way is to use e.g. the local URL as a key into my database > (easy but needs to be done each time and could slow things down) > > Is it possible to add extra fields to ferret index entries?of course that is possible, RDig itself uses three different fields - :url, :title and :data.> If so, can this be done at create time or must it be done afterwards? If > it can be done at create time is there a way to get RDig to insert these > extra fields?Ferret documents cannot be modified after they have been created, so any custom fields you want to add have to be added when the index is created. Atm RDig doesn''t support custom fields, however I''d be happy to apply a patch adding this capability ;-) cheers, Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Hi, To summarise, I can add custom fields at create time but not afterwards. Furthermore RDig does not presently support the addition of custom fields. Please could you post your patch to enable RDig to support custom fields. Thanks Ed Jens Kraemer wrote:> Hi! > > On Sat, Feb 10, 2007 at 12:29:27PM +0100, Ed Ed wrote: > > > Ferret documents cannot be modified after they have been created, so any > custom fields you want to add have to be added when the index is > created. > > Atm RDig doesn''t support custom fields, however I''d be happy to apply a > patch adding this capability ;-) > >-- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Feb-12 09:01 UTC
[Ferret-talk] Adding extra fields to an index (using RDig?)
On Sun, Feb 11, 2007 at 07:17:51PM +0100, Ed Ed wrote:> Hi, > > To summarise, I can add custom fields at create time but not afterwards. > Furthermore RDig does not presently support the addition of custom > fields.Right.> > Please could you post your patch to enable RDig to support custom > fields.oh, what I wanted to say is that if *you* built such a feature into RDig, I''d be happy to integrate it. Sorry if I''ve been unclear here. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66
Jens Kraemer wrote:> oh, what I wanted to say is that if *you* built such a feature into > RDig, I''d be happy to integrate it. Sorry if I''ve been unclear here. >:-( OK, I''ll have a look at the code and see what might be simplest. Seems to me that adding an extra optional directive to the configuration file is easiest. This could name a file containing a user-supplied hook which rdig/indexer.rb could try to include. Or just define the hook procedure in the config file? Then if the hook procedure existed the indexer could pass it the document and doc data structure and the hook procedure could augment the doc structure as required. I guess the only Ferret requirement here is that the hook must add the same set of extra fields to each document (even if values NULL) Ed -- Posted via http://www.ruby-forum.com/.
Jens Kraemer
2007-Feb-12 12:49 UTC
[Ferret-talk] Adding extra fields to an index (using RDig?)
On Mon, Feb 12, 2007 at 12:55:54PM +0100, Ed Ed wrote: [..]> > OK, I''ll have a look at the code and see what might be simplest. Seems > to me that adding an extra optional directive to the configuration file > is easiest. This could name a file containing a user-supplied hook which > rdig/indexer.rb could try to include. Or just define the hook procedure > in the config file?defining the hook method in the config sounds good.> Then if the hook procedure existed the indexer could pass it the > document and doc data structure and the hook procedure could augment the > doc structure as required.exactly.> I guess the only Ferret requirement here is that the hook must add the > same set of extra fields to each document (even if values NULL)not even that, you can have different ferret documents with a different set of fields. Jens -- webit! Gesellschaft f?r neue Medien mbH www.webit.de Dipl.-Wirtschaftsingenieur Jens Kr?mer kraemer at webit.de Schnorrstra?e 76 Tel +49 351 46766 0 D-01069 Dresden Fax +49 351 46766 66