thr3ads.net - Ferret talk - [Ferret-talk] Query question [Dec 2005]

If this information is useful, please help other people find it:
Share via:

Carl Youngblood

2005-Dec-14 05:54 UTC

[Ferret-talk] Query question

I have an index in which I want different records to be accessible to
different users.  I think I can do this by adding a "users" field to
each record in the index and narrow down my queries to only those
records matching the current user''s userid.  I have the userids
separated by commas.  What would be the right way to query for a
certain user?  I have to make sure that I don''t find records belonging
to the wrong user because a shorter number matches a larger one.  For
example, if a users field contains:

3,45,66,7779

I don''t want a query for 77 to match this.  How can I make sure my
query matches whole words only?

Thanks,

Carl

David Balmain

2005-Dec-14 06:14 UTC

head link

[Ferret-talk] Query question

On 12/14/05, Carl Youngblood <carl at youngbloods.org>
wrote:> I have an index in which I want different records to be accessible to
> different users.  I think I can do this by adding a "users" field
to
> each record in the index and narrow down my queries to only those
> records matching the current user''s userid.  I have the userids
> separated by commas.  What would be the right way to query for a
> certain user?  I have to make sure that I don''t find records
belonging
> to the wrong user because a shorter number matches a larger one.  For
> example, if a users field contains:
>
> 3,45,66,7779
>
> I don''t want a query for 77 to match this.  How can I make sure my
> query matches whole words only?
You have two choices. To match whole words only, ie, seperated by
spaces, use the WhitespaceAnalyzer. You can use the PerFieldAnalyzer
if you only want to use the WhitespaceAnalyzer on one field and the
StandardAnalyzers on all the others.

The second choice which I''d recommend in this instance is to store the
field untokenized. For example

    doc = Document::Document.new()

    # Note the UNTOKENIZED here. That means the whole field is indexed in
    # a single term. You don''t have to store the field if you
don''t want to.
    doc << Document::Field.new(:user_id, "3,45,66,7779",
                                    Document::Field::Store::YES,
                                    Document::Field::Index::UNTOKENIZED)

    index << doc

    query = TermQuery.new(Term.new(:user_id, "3,45,66,7779"))
    index.search_each(query)...etc.

Hope this makes sense. Let me know if you need more clarification.

Cheers,
Dave
> Thanks,
>
> Carl
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Carl Youngblood

2005-Dec-14 06:50 UTC

head link

[Ferret-talk] Query question

On 12/13/05, David Balmain <dbalmain.ml at gmail.com>
wrote:>     doc << Document::Field.new(:user_id, "3,45,66,7779",
>                                     Document::Field::Store::YES,
>                                     Document::Field::Index::UNTOKENIZED)
>
>     index << doc
>
>     query = TermQuery.new(Term.new(:user_id, "3,45,66,7779"))
>     index.search_each(query)...etc.
I don''t think that''s going to work for me, because
I''m never going to
be querying the full value of :user_id.  I''m always going to be
querying only one of the numbers between the commas.  In this case, an
untokenized field won''t work for me, right?

I think maybe the better thing to do is to separate the ids with
spaces and use the WhitespaceAnalyzer.  So just to make sure I have
this straight, if I separate my ids with spaces, like so:

index << {
  :id => 1,
  :users => ''1 2 3'',
  :contents => ''string number one'',
}
index << {
  :id => 2,
  :users => ''33 45'',
  :contents => ''string number two'',
}

And then I do a query like this:

count = index.search_each(''users:("3")
contents:"string"'') do |d, s|
  puts index[d][:contents]
end

Will I get only the first record or will I get both?

Thanks,

Carl

David Balmain

2005-Dec-14 07:13 UTC

head link

[Ferret-talk] Query question

On 12/14/05, Carl Youngblood <carl at youngbloods.org>
wrote:> On 12/13/05, David Balmain <dbalmain.ml at gmail.com> wrote:
> >     doc << Document::Field.new(:user_id,
"3,45,66,7779",
> >                                     Document::Field::Store::YES,
> >                                    
Document::Field::Index::UNTOKENIZED)
> >
> >     index << doc
> >
> >     query = TermQuery.new(Term.new(:user_id,
"3,45,66,7779"))
> >     index.search_each(query)...etc.
>
> I don''t think that''s going to work for me, because
I''m never going to
> be querying the full value of :user_id.  I''m always going to be
> querying only one of the numbers between the commas.  In this case, an
> untokenized field won''t work for me, right?
>
> I think maybe the better thing to do is to separate the ids with
> spaces and use the WhitespaceAnalyzer.  So just to make sure I have
> this straight, if I separate my ids with spaces, like so:
>
> index << {
>   :id => 1,
>   :users => ''1 2 3'',
>   :contents => ''string number one'',
> }
> index << {
>   :id => 2,
>   :users => ''33 45'',
>   :contents => ''string number two'',
> }
>
> And then I do a query like this:
>
> count = index.search_each(''users:("3")
contents:"string"'') do |d, s|
>   puts index[d][:contents]
> end
>
> Will I get only the first record or will I get both?
Just the first one. Sorry, I didn''t understand what you wanted before.
Any query will only match the whole word unless you use a wildcard.
For example

    index.search_each(''users:3* contents:string'')

Will match both of the documents above.

    index.search_each(''users:3 contents:"string
number"'')

Will only match the first one. Also note that ''"'' are
used to wrap
phrases but are not necessary for single word queries.

> Thanks,
>
> Carl
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Erik Hatcher

2005-Dec-14 12:08 UTC

head link

[Ferret-talk] Query question

Dave - isn''t there a slick way to use a regex for an analyzer to  
split tokens with Ferret?   If so, that would be an ideal solution  
for splitting at a comma.  Or you could split the string prior to  
indexing, iterate over the array from the split, and then index each  
user id as a unique untokenized but indexed field.

	Erik


On Dec 14, 2005, at 12:54 AM, Carl Youngblood wrote:
> I have an index in which I want different records to be accessible to
> different users.  I think I can do this by adding a "users" field
to
> each record in the index and narrow down my queries to only those
> records matching the current user''s userid.  I have the userids
> separated by commas.  What would be the right way to query for a
> certain user?  I have to make sure that I don''t find records
belonging
> to the wrong user because a shorter number matches a larger one.  For
> example, if a users field contains:
>
> 3,45,66,7779
>
> I don''t want a query for 77 to match this.  How can I make sure my
> query matches whole words only?
>
> Thanks,
>
> Carl
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk

David Balmain

2005-Dec-14 12:24 UTC

head link

[Ferret-talk] Query question

On 12/14/05, Erik Hatcher <erik at ehatchersolutions.com>
wrote:> Dave - isn''t there a slick way to use a regex for an analyzer to
> split tokens with Ferret?   If so, that would be an ideal solution
> for splitting at a comma.  Or you could split the string prior to
> indexing, iterate over the array from the split, and then index each
> user id as a unique untokenized but indexed field.
Sure. Here is an analyzer that splits the field on commas.

  class CommaAnalyzer < Ferret::Analysis::Analyzer
    class CommaTokenizer < Ferret::Analysis::RegExpTokenizer
      def token_re
        /[^,]+/
      end
    end
    def token_stream(field, string)
      return CommaTokenizer.new(string)
    end
  end

This makes me think, it might be cool to have a RegExpAnalyzer like this;

analyzer = RegExpAnalyzer.new(:default => STANDARD_RE,
                        :user_id => /[^,]+/,
                        :phone_num => /[-()0-9]+/)

Any thoughts, criticisms?

Dave
>         Erik
>
>
> On Dec 14, 2005, at 12:54 AM, Carl Youngblood wrote:
>
> > I have an index in which I want different records to be accessible to
> > different users.  I think I can do this by adding a "users"
field to
> > each record in the index and narrow down my queries to only those
> > records matching the current user''s userid.  I have the
userids
> > separated by commas.  What would be the right way to query for a
> > certain user?  I have to make sure that I don''t find records
belonging
> > to the wrong user because a shorter number matches a larger one.  For
> > example, if a users field contains:
> >
> > 3,45,66,7779
> >
> > I don''t want a query for 77 to match this.  How can I make
sure my
> > query matches whole words only?
> >
> > Thanks,
> >
> > Carl
> >
> > _______________________________________________
> > Ferret-talk mailing list
> > Ferret-talk at rubyforge.org
> > http://rubyforge.org/mailman/listinfo/ferret-talk
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Apparently Analagous Threads

Search for more reasonably related threads

Ferret talk - Dec 2005 - Query question

[Ferret-talk] Query question

[Ferret-talk] Query question

[Ferret-talk] Query question

[Ferret-talk] Query question

[Ferret-talk] Query question

[Ferret-talk] Query question

Apparently Analagous Threads