thr3ads.net - Ferret talk - [Ferret-talk] Dynamic fields and AAF [Sep 2006]

If this information is useful, please help other people find it:
Share via:

David Sheldon

2006-Sep-18 15:28 UTC

[Ferret-talk] Dynamic fields and AAF

Hi,

I have a model which has properties, these are your standard name/value 
pairs, but also have attributes that affect how I want to store them in 
ferret. I was using 0.9.5 with 0.2 of aaf, which seemed fine, I just 
copied and pasted (yes, I know, ick) the to_doc method and added code to 
iterate though the properties that that model had, and add relavent 
fields to the document.

It seems that this will be a bit harder now with the FieldInfos. Has 
anyone else done this, and is there a recognised way of doing it?

David

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-Sep-18 16:59 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On Mon, Sep 18, 2006 at 05:28:45PM +0200, David Sheldon
wrote:> Hi,
> 
> I have a model which has properties, these are your standard name/value 
> pairs, but also have attributes that affect how I want to store them in 
> ferret. I was using 0.9.5 with 0.2 of aaf, which seemed fine, I just 
> copied and pasted (yes, I know, ick) the to_doc method and added code to 
> iterate though the properties that that model had, and add relavent 
> fields to the document.
instead copy''n paste you could just call super:

def to_doc
  doc = super
  # custom code here
  doc
end
> It seems that this will be a bit harder now with the FieldInfos. Has 
> anyone else done this, and is there a recognised way of doing it?
imho adding arbitrary fields should work, you just can''t specify any
special per-field storage/indexing options, since the defaults
determined at index creation will be used.

With aaf this means
:store => :no, 
:index => :tokenize

changing the characteristics of a field for a special document doesn''t
seem to be possible any more. Was that what you did until now, i.e.
tokenize or store a field''s value sometimes, and sometimes not ?

Jens

-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

David Sheldon

2006-Sep-19 06:50 UTC

head link

[Ferret-talk] Dynamic fields and AAF

Jens Kraemer wrote:> instead copy''n paste you could just call super:
> 
> def to_doc
>   doc = super
>   # custom code here
>   doc
> end
Ah, I had missed out on that, I don''t really understand how super works
in ruby. I had been trying to rename the method and create a new one
aliased to it which didn''t work. I''m still a bit confused as
to_doc is
created by the mixin as an instance method, is there still a superclass
version? Anyway thanks for that tip, I''ll try it.
> changing the characteristics of a field for a special document
doesn''t
> seem to be possible any more. Was that what you did until now, i.e.
> tokenize or store a field''s value sometimes, and sometimes not ?
Yes. Some are strings (tokenize), some are integers (dont tokenize,
ideally use a different analyser), and some are choices from lists
(either untokenized String or treat as integer index of choice). Dates
are treated as integers, and we may want to include some strings in the
DB so they can be displayed in the search results.

David

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-Sep-19 08:04 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On Tue, Sep 19, 2006 at 08:50:29AM +0200, David Sheldon
wrote:> Jens Kraemer wrote:
> > instead copy''n paste you could just call super:
> > 
> > def to_doc
> >   doc = super
> >   # custom code here
> >   doc
> > end
> 
> Ah, I had missed out on that, I don''t really understand how super
works
> in ruby. I had been trying to rename the method and create a new one
> aliased to it which didn''t work. I''m still a bit confused
as to_doc is
> created by the mixin as an instance method, is there still a superclass
> version? Anyway thanks for that tip, I''ll try it.
ah, good point. But this should still work if you do the override after 
calling acts_as_ferret.
 > > changing the characteristics of a field for a special document
doesn''t
> > seem to be possible any more. Was that what you did until now, i.e.
> > tokenize or store a field''s value sometimes, and sometimes
not ?
> 
> Yes. Some are strings (tokenize), some are integers (dont tokenize,
> ideally use a different analyser), and some are choices from lists
> (either untokenized String or treat as integer index of choice). Dates
> are treated as integers, and we may want to include some strings in the
> DB so they can be displayed in the search results.
difficult. you could declare one field per type of data (in terms of
indexed/stored) you possibly run into, and in your to_doc then decide 
which data has to go into which field. doesn''t sound really nice to
mee,
but might work. For searching you would then always have to search all
these fields, of yourse.

Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

David Balmain

2006-Sep-19 08:10 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On 9/19/06, David Sheldon <david.sheldon at torchbox.com>
wrote:> Jens Kraemer wrote:
> > changing the characteristics of a field for a special document
doesn''t
> > seem to be possible any more. Was that what you did until now, i.e.
> > tokenize or store a field''s value sometimes, and sometimes
not ?
>
> Yes. Some are strings (tokenize), some are integers (dont tokenize,
> ideally use a different analyser), and some are choices from lists
> (either untokenized String or treat as integer index of choice). Dates
> are treated as integers, and we may want to include some strings in the
> DB so they can be displayed in the search results.
>
> David
Hi David,

Is there any reason you need them all to be in the same field? Or am I
misunderstanding you? You do realize that different fields can have
different properties right?

Cheers,
Dave

David Sheldon

2006-Sep-19 10:59 UTC

head link

[Ferret-talk] Dynamic fields and AAF

David Balmain wrote:
> Is there any reason you need them all to be in the same field? Or am I
> misunderstanding you? You do realize that different fields can have
> different properties right?
Yes, I want them all in different fields, named after the property, that 
way you could search for someone''s name by ''name:Bob''
or their year of
matriculation with ''matriculation:1978''. The problem is that
on creation
of the index I do not know what properties will be associated with users 
so cannot define their field infos. Previously I was able to just 
specify the properties when adding that field to the document.

David

-- 
Posted via http://www.ruby-forum.com/.

Jan Prill

2006-Sep-19 12:12 UTC

head link

[Ferret-talk] Dynamic fields and AAF

without reading the whole thread:

1. you know that users have properties, right?
2. theses properties are like key value pairs. one could have a property
like hobby: ''cars'', another user might have a property like
place-of-birth:
''Hamburg, Germany''
3. users might build their property key-value dynamically. You don''t
know
which user chooses to inform you about which property
4. couldn''t you use rubys reflection, inflection whatever features to
iterate over the properties of which a user has many from and then inflect
the key-value pairs to put them into the index?
5. this would mean that the field list of the index might grow to a great
number. don''t know how this would affect ferret. this further means
that you
need to know which fields one is able to search for. you would need to build
something like an extended search form with all of these fields or inform
the user about which fields he might use in his queries with effect. he
should also be informed that only because of the existance of this field a
user might not have provided this information. maybe it''s only one user
that
informed you about his place-of-birth.

cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060919/05be046f/attachment-0001.html

Jan Prill

2006-Sep-19 12:18 UTC

head link

[Ferret-talk] Dynamic fields and AAF

imho the described problem of a growing field list is one of the reasons for
the popularity of tags. Simply let the user choose how to tag himself, his
question, comment whatever and don''t care about the field.
it''s fulltext
search for a reason. imho you''ve got two sides in things like this: 1.
predefine a field list, that would be filled in by most users and therefore
is valueable information for your search, 2. choose tags for the stuff where
users should be able to freely decide about the categorization.

cheers,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060919/15dda497/attachment.html

David Balmain

2006-Sep-19 12:21 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On 9/19/06, David Sheldon <david.sheldon at torchbox.com>
wrote:> David Balmain wrote:
>
> > Is there any reason you need them all to be in the same field? Or am I
> > misunderstanding you? You do realize that different fields can have
> > different properties right?
>
> Yes, I want them all in different fields, named after the property, that
> way you could search for someone''s name by
''name:Bob'' or their year of
> matriculation with ''matriculation:1978''. The problem is
that on creation
> of the index I do not know what properties will be associated with users
> so cannot define their field infos. Previously I was able to just
> specify the properties when adding that field to the document.
>
> David
I''m assuming the matriculation field is always going to be a number.
It won''t change at a later date. So you can just set up the field
whenever you use it for the first time.

    require ''rubygems''
    require ''ferret''
    i = Ferret::I.new
    puts i.field_infos
    if not i.field_infos[:matriculation]
      i.field_infos.add_field(:matriculation,
                              :index => :untokenized)
    end
    puts i.field_infos
    i << {:matriculation => 1978}

Of course you only need to do this for fields which vary from the
norm. Whatever properties you instantiated the FieldInfos with will be
used for fields added with the FieldInfos#add_field method unless
otherwise specified. So if most of your fields are number or date
fields you''d create the FieldInfos like this:

    fis = FieldInfos.new(:index => :untokenized_omit_norms, :term_vector
=> :no)

Now when you add a text field you''ll need to explicitly set it to
tokenized and store term vectors:

    if not i.field_infos[:content]
      i.field_infos.add_field(:content,
                              :term_vector => :with_positions_offsets,
                              :index => :yes)
    end

Let me know if this helps or not.

Cheers,
Dave

David Sheldon

2006-Sep-19 15:52 UTC

head link

[Ferret-talk] Dynamic fields and AAF

David Balmain wrote:> On 9/19/06, David Sheldon <david.sheldon at torchbox.com> wrote:
>> so cannot define their field infos. Previously I was able to just
>> specify the properties when adding that field to the document.
>>
>> David
> 
> I''m assuming the matriculation field is always going to be a
number.
> It won''t change at a later date. So you can just set up the field
> whenever you use it for the first time.
I''ve considered this. I use aaf, and this requires the model that 
describes what fields are allowed on objects to have access to the index 
models indexer, this isn''t too bad. The only problem is when the index 
is created by something like rebuild_index, which needs to be extended 
to create all the extra fields.

I don''t want to add the fields to fields_for_ferret, as that would mean
calling #{fieldname}_for_ferret for each possible property, rather than 
taking the properties defined on that user, and adding them.

Would the fields_for_ferret solution be the correct way, somehow 
populating this out of the database and then overriding the 
foo_to_ferret methods to look in a cache?

This was really easy with the old API. It seems a shame that it is so 
hard now.

David

-- 
Posted via http://www.ruby-forum.com/.

David Sheldon

2006-Sep-19 16:04 UTC

head link

[Ferret-talk] Dynamic fields and AAF

David Balmain wrote:
> I''m assuming the matriculation field is always going to be a
number.
> It won''t change at a later date. So you can just set up the field
> whenever you use it for the first time.
> 
>     require ''rubygems''
>     require ''ferret''
>     i = Ferret::I.new
>     puts i.field_infos
>     if not i.field_infos[:matriculation]
>       i.field_infos.add_field(:matriculation,
>                               :index => :untokenized)
>     end
>     puts i.field_infos
>     i << {:matriculation => 1978}
Oh, I didn''t really read this last time.

It looks like this might be handy,

http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html only 
lists the IndexReader as having the field_infos.

How much overhead would it be to write an "add_value" method that is 
called, say 10 times per doc, which will lookup the field we''re going
to
add in the index, and add it if it isn''t already there?

Is this what the old code did anyway?

David

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Sep-20 09:01 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On 9/20/06, David Sheldon <david.sheldon at torchbox.com>
wrote:> David Balmain wrote:
>
> > I''m assuming the matriculation field is always going to be a
number.
> > It won''t change at a later date. So you can just set up the
field
> > whenever you use it for the first time.
> >
> >     require ''rubygems''
> >     require ''ferret''
> >     i = Ferret::I.new
> >     puts i.field_infos
> >     if not i.field_infos[:matriculation]
> >       i.field_infos.add_field(:matriculation,
> >                               :index => :untokenized)
> >     end
> >     puts i.field_infos
> >     i << {:matriculation => 1978}
>
> Oh, I didn''t really read this last time.
>
> It looks like this might be handy,
>
> http://ferret.davebalmain.com/api/classes/Ferret/Index/Index.html only
> lists the IndexReader as having the field_infos.
>
> How much overhead would it be to write an "add_value" method that
is
> called, say 10 times per doc, which will lookup the field we''re
going to
> add in the index, and add it if it isn''t already there?
Not a lot. It''s a hash lookup so it''s fast and it should be
rare
(after a while at least) that new fields are added. ie, it''s probably
not going to happen for every document.
> Is this what the old code did anyway?
>
> David
The old code created a completely new FieldInfos object for every
document you add to the index. It then merges the field_infos objects
when the documents are merged. In other words it was a lot more
complex. This is one of the reasons for the API change. Even after
adding the add_value method, I''d guess that the newer version of
Ferret will still index a lot faster.

Cheers,
Dave

David Sheldon

2006-Sep-20 09:22 UTC

head link

[Ferret-talk] Dynamic fields and AAF

David Sheldon wrote:
> How much overhead would it be to write an "add_value" method that
is
> called, say 10 times per doc, which will lookup the field we''re
going to
> add in the index, and add it if it isn''t already there?
Ok, I''ve done this. But it was causing problems when called from 
rebuild_index, as there isn''t an index at that point, and I was calling
ferret_index on my model, which was creating a new index which couldnt 
get a write lock for my new fields.

I have solved this by giving to_doc an optional index parameter that is 
passed in when rebuild is running, but if it is nil, it will call 
Model.ferret_index.

It seems like an incorrect separation for the index to be passed in to 
the to_doc method. Have you any suggestions on how to make this nicer?

David

-- 
Posted via http://www.ruby-forum.com/.

Jens Kraemer

2006-Sep-20 11:40 UTC

head link

[Ferret-talk] Dynamic fields and AAF

Hi!

On Wed, Sep 20, 2006 at 11:22:52AM +0200, David Sheldon
wrote:> David Sheldon wrote:
> 
> > How much overhead would it be to write an "add_value" method
that is
> > called, say 10 times per doc, which will lookup the field
we''re going to
> > add in the index, and add it if it isn''t already there?
> 
> Ok, I''ve done this. But it was causing problems when called from 
> rebuild_index, as there isn''t an index at that point, and I was
calling
> ferret_index on my model, which was creating a new index which couldnt 
> get a write lock for my new fields.
> 
> I have solved this by giving to_doc an optional index parameter that is 
> passed in when rebuild is running, but if it is nil, it will call 
> Model.ferret_index.
> 
> It seems like an incorrect separation for the index to be passed in to 
> the to_doc method. Have you any suggestions on how to make this nicer?
I could change the way rebuild_index works so that it uses and
initializes the Ferret index instance returned by ferret_index. So you
could access the index instance in to_doc when being called by
rebuild_index, too.

Jens


-- 
webit! Gesellschaft f?r neue Medien mbH          www.webit.de
Dipl.-Wirtschaftsingenieur Jens Kr?mer       kraemer at webit.de
Schnorrstra?e 76                         Tel +49 351 46766  0
D-01069 Dresden                          Fax +49 351 46766 66

David Sheldon

2006-Sep-20 13:56 UTC

head link

[Ferret-talk] Dynamic fields and AAF

Jens Kraemer wrote:> I could change the way rebuild_index works so that it uses and
> initializes the Ferret index instance returned by ferret_index. So you
> could access the index instance in to_doc when being called by
> rebuild_index, too.
That sounds good.

The other thing I noticed was that if you wanted to create a field that 
is created by rebuild_index, but isn''t actually put in there by the 
standard to_doc you can specifiy the fields along with :ignore => true, 
for example { :index => :untokenized, :ignore => true }. I want to do 
this as there is a field that I want to include many times on a 
document, and returning an array from foo_for_ferret didn''t add a field
for each.

David, are you supposed to be able to set several values for a field in 
the document?

Thanks for all you guy''s support.

David

-- 
Posted via http://www.ruby-forum.com/.

David Balmain

2006-Sep-20 17:14 UTC

head link

[Ferret-talk] Dynamic fields and AAF

On 9/20/06, David Sheldon <david.sheldon at torchbox.com>
wrote:> David, are you supposed to be able to set several values for a field in
> the document?
I think I know what you are asking here but I''m not sure. You can do
this in Ferret:

    index << {:content = "yada yada yada", :tags =>
["ruby", "rails", "ferret"]}

So :tags has multiple values. But you can''t do this:

    doc = Ferret::Document.new
    doc[:tag] = "ruby"
    doc[:tag] = "rails"
    doc[:tag] = "ferret"

You should do this:

    doc[:tag] = ["ruby", "rails", "ferret"]

Or this:

    doc[:tag] = ["ruby"]
    doc[:tag] << "rails"
    doc[:tag] << "ferret"

After all, Ferret::Document is just a Hash with a boost field.

Perhaps I have just misunderstood you completely so please let me know if I did.

Cheers,
Dave

David Sheldon

2006-Sep-21 08:31 UTC

head link

[Ferret-talk] Dynamic fields and AAF

David Balmain wrote:> So :tags has multiple values. But you can''t do this:
> 
>     doc = Ferret::Document.new
>     doc[:tag] = "ruby"
>     doc[:tag] = "rails"
>     doc[:tag] = "ferret"
> 
> You should do this:
> 
>     doc[:tag] = ["ruby", "rails", "ferret"]
That is exactly what I mean. And it looks like that is another way I can 
simplify my code with the new API. I can return an array from 
foo_for_ferret and have all the individual values counted.

Previously I did basically
 networks.each { |net| doc << Field.new(''network'',
net.name) }

Thanks.

David

-- 
Posted via http://www.ruby-forum.com/.

Seemingly Similar Threads

Search for more seemingly similar threads

Ferret talk - Sep 2006 - Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

[Ferret-talk] Dynamic fields and AAF

Seemingly Similar Threads