I have an index in which I want different records to be accessible to different users. I think I can do this by adding a "users" field to each record in the index and narrow down my queries to only those records matching the current user''s userid. I have the userids separated by commas. What would be the right way to query for a certain user? I have to make sure that I don''t find records belonging to the wrong user because a shorter number matches a larger one. For example, if a users field contains: 3,45,66,7779 I don''t want a query for 77 to match this. How can I make sure my query matches whole words only? Thanks, Carl
On 12/14/05, Carl Youngblood <carl at youngbloods.org> wrote:> I have an index in which I want different records to be accessible to > different users. I think I can do this by adding a "users" field to > each record in the index and narrow down my queries to only those > records matching the current user''s userid. I have the userids > separated by commas. What would be the right way to query for a > certain user? I have to make sure that I don''t find records belonging > to the wrong user because a shorter number matches a larger one. For > example, if a users field contains: > > 3,45,66,7779 > > I don''t want a query for 77 to match this. How can I make sure my > query matches whole words only?You have two choices. To match whole words only, ie, seperated by spaces, use the WhitespaceAnalyzer. You can use the PerFieldAnalyzer if you only want to use the WhitespaceAnalyzer on one field and the StandardAnalyzers on all the others. The second choice which I''d recommend in this instance is to store the field untokenized. For example doc = Document::Document.new() # Note the UNTOKENIZED here. That means the whole field is indexed in # a single term. You don''t have to store the field if you don''t want to. doc << Document::Field.new(:user_id, "3,45,66,7779", Document::Field::Store::YES, Document::Field::Index::UNTOKENIZED) index << doc query = TermQuery.new(Term.new(:user_id, "3,45,66,7779")) index.search_each(query)...etc. Hope this makes sense. Let me know if you need more clarification. Cheers, Dave> Thanks, > > Carl > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
On 12/13/05, David Balmain <dbalmain.ml at gmail.com> wrote:> doc << Document::Field.new(:user_id, "3,45,66,7779", > Document::Field::Store::YES, > Document::Field::Index::UNTOKENIZED) > > index << doc > > query = TermQuery.new(Term.new(:user_id, "3,45,66,7779")) > index.search_each(query)...etc.I don''t think that''s going to work for me, because I''m never going to be querying the full value of :user_id. I''m always going to be querying only one of the numbers between the commas. In this case, an untokenized field won''t work for me, right? I think maybe the better thing to do is to separate the ids with spaces and use the WhitespaceAnalyzer. So just to make sure I have this straight, if I separate my ids with spaces, like so: index << { :id => 1, :users => ''1 2 3'', :contents => ''string number one'', } index << { :id => 2, :users => ''33 45'', :contents => ''string number two'', } And then I do a query like this: count = index.search_each(''users:("3") contents:"string"'') do |d, s| puts index[d][:contents] end Will I get only the first record or will I get both? Thanks, Carl
On 12/14/05, Carl Youngblood <carl at youngbloods.org> wrote:> On 12/13/05, David Balmain <dbalmain.ml at gmail.com> wrote: > > doc << Document::Field.new(:user_id, "3,45,66,7779", > > Document::Field::Store::YES, > > Document::Field::Index::UNTOKENIZED) > > > > index << doc > > > > query = TermQuery.new(Term.new(:user_id, "3,45,66,7779")) > > index.search_each(query)...etc. > > I don''t think that''s going to work for me, because I''m never going to > be querying the full value of :user_id. I''m always going to be > querying only one of the numbers between the commas. In this case, an > untokenized field won''t work for me, right? > > I think maybe the better thing to do is to separate the ids with > spaces and use the WhitespaceAnalyzer. So just to make sure I have > this straight, if I separate my ids with spaces, like so: > > index << { > :id => 1, > :users => ''1 2 3'', > :contents => ''string number one'', > } > index << { > :id => 2, > :users => ''33 45'', > :contents => ''string number two'', > } > > And then I do a query like this: > > count = index.search_each(''users:("3") contents:"string"'') do |d, s| > puts index[d][:contents] > end > > Will I get only the first record or will I get both?Just the first one. Sorry, I didn''t understand what you wanted before. Any query will only match the whole word unless you use a wildcard. For example index.search_each(''users:3* contents:string'') Will match both of the documents above. index.search_each(''users:3 contents:"string number"'') Will only match the first one. Also note that ''"'' are used to wrap phrases but are not necessary for single word queries.> Thanks, > > Carl > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
Dave - isn''t there a slick way to use a regex for an analyzer to split tokens with Ferret? If so, that would be an ideal solution for splitting at a comma. Or you could split the string prior to indexing, iterate over the array from the split, and then index each user id as a unique untokenized but indexed field. Erik On Dec 14, 2005, at 12:54 AM, Carl Youngblood wrote:> I have an index in which I want different records to be accessible to > different users. I think I can do this by adding a "users" field to > each record in the index and narrow down my queries to only those > records matching the current user''s userid. I have the userids > separated by commas. What would be the right way to query for a > certain user? I have to make sure that I don''t find records belonging > to the wrong user because a shorter number matches a larger one. For > example, if a users field contains: > > 3,45,66,7779 > > I don''t want a query for 77 to match this. How can I make sure my > query matches whole words only? > > Thanks, > > Carl > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk
On 12/14/05, Erik Hatcher <erik at ehatchersolutions.com> wrote:> Dave - isn''t there a slick way to use a regex for an analyzer to > split tokens with Ferret? If so, that would be an ideal solution > for splitting at a comma. Or you could split the string prior to > indexing, iterate over the array from the split, and then index each > user id as a unique untokenized but indexed field.Sure. Here is an analyzer that splits the field on commas. class CommaAnalyzer < Ferret::Analysis::Analyzer class CommaTokenizer < Ferret::Analysis::RegExpTokenizer def token_re /[^,]+/ end end def token_stream(field, string) return CommaTokenizer.new(string) end end This makes me think, it might be cool to have a RegExpAnalyzer like this; analyzer = RegExpAnalyzer.new(:default => STANDARD_RE, :user_id => /[^,]+/, :phone_num => /[-()0-9]+/) Any thoughts, criticisms? Dave> Erik > > > On Dec 14, 2005, at 12:54 AM, Carl Youngblood wrote: > > > I have an index in which I want different records to be accessible to > > different users. I think I can do this by adding a "users" field to > > each record in the index and narrow down my queries to only those > > records matching the current user''s userid. I have the userids > > separated by commas. What would be the right way to query for a > > certain user? I have to make sure that I don''t find records belonging > > to the wrong user because a shorter number matches a larger one. For > > example, if a users field contains: > > > > 3,45,66,7779 > > > > I don''t want a query for 77 to match this. How can I make sure my > > query matches whole words only? > > > > Thanks, > > > > Carl > > > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >