thr3ads.net - Ferret talk - [Ferret-talk] Whitespace Issues [Jul 2006]

If this information is useful, please help other people find it:
Share via:

BlueJay

2006-Jul-14 14:35 UTC

[Ferret-talk] Whitespace Issues

I am trying to build up a filtered search using the logic below.


	bq = Ferret::Search::BooleanQuery.new
	
bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::Term.new("section",section.downcase!)),
Ferret::Search::BooleanClause::Occur::MUST)

     	filter = Ferret::Search::QueryFilter.new(bq)
  	 	@vobjects = VoObject.find_by_contents(search_input,:filter => 
filter, :sort => ["section", "sale_category"])


This works fine when the "section" is a single word like
"book" but when
there is white spaces in the query like "paperback book" it does not 
find the appropriate result and comes back with zero hits.

I changed this to use FuzzyQuery and it works but I sometimes get 
segmentation errors (this was reported in another topic).

Does anyone have a solution to this problem for me?

Thanks very much.

-- 
Posted via http://www.ruby-forum.com/.

Jeremy Bensley

2006-Jul-14 14:58 UTC

head link

[Ferret-talk] Whitespace Issues

It''s hard to know for sure without seeing how your index is built, but
if
you are using TOKENIZED on that field, then whenever the index is built the
text is split on whitespace, and each element is added as a separate term.
It looks like when you are searching, you are trying to find the entire text
as a single term.

In order to solve this, I believe you can either construct your query using
QueryParser, which will use the analyzer / tokenizer and split the terms out
for you, or you can simply split the ''section'' string on
whitespace and
build a Term and TermQuery for each resulting element and build a
PhraseQuery from that set.

I hope this is some help,

Jeremy

On 7/14/06, BlueJay <clare.cavanagh at btclick.com>
wrote:>
> I am trying to build up a filtered search using the logic below.
>
>
>         bq = Ferret::Search::BooleanQuery.new
>                 bq.add_query(Ferret::Search::TermQuery.new(Ferret::Index::
> Term.new("section",section.downcase!)),
> Ferret::Search::BooleanClause::Occur::MUST)
>
>         filter = Ferret::Search::QueryFilter.new(bq)
>                 @vobjects = VoObject.find_by_contents(search_input,:filter
> =>
> filter, :sort => ["section", "sale_category"])
>
>
> This works fine when the "section" is a single word like
"book" but when
> there is white spaces in the query like "paperback book" it does
not
> find the appropriate result and comes back with zero hits.
>
> I changed this to use FuzzyQuery and it works but I sometimes get
> segmentation errors (this was reported in another topic).
>
> Does anyone have a solution to this problem for me?
>
> Thanks very much.
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060714/dacd0555/attachment.html

BlueJay

2006-Jul-14 16:37 UTC

head link

[Ferret-talk] Whitespace Issues

Jeremy Bensley wrote:> It''s hard to know for sure without seeing how your index is built,
but
> if
> you are using TOKENIZED on that field, then whenever the index is built 
> the
> text is split on whitespace, and each element is added as a separate 
> term.
Jeremy

Thanks for the reply. I am building the index like this...

class VoObject < ActiveRecord::Base
  acts_as_ferret :fields=> 
[''short_description'',''section'',''sale_category'',''sale_type'',''outcode'']
> It looks like when you are searching, you are trying to find the entire 
> text
> as a single term.
> 
> In order to solve this, I believe you can either construct your query 
> using
> QueryParser, which will use the analyzer / tokenizer and split the terms 
> out
> for you, or you can simply split the ''section'' string on
whitespace and
> build a Term and TermQuery for each resulting element and build a
> PhraseQuery from that set.
Sorry for asking a silly question but how would I go about doing this?
> I hope this is some help,
> 
> Jeremy



-- 
Posted via http://www.ruby-forum.com/.

Jeremy Bensley

2006-Jul-14 18:16 UTC

head link

[Ferret-talk] Whitespace Issues

Method #1 should be shorter / easier, and would look something like this:

qp = Ferret::QueryParser.new("section")  #section defines the default
field
to build the query

query = qp.parse("\"#{section}\"")

# modified boolean query
bq = Ferret::Search::BooleanQuery.new
bq.add_query(pq, Ferret::Search::BooleanClause::Occur::MUST)

filter = Ferret::Search::QueryFilter.new(bq)
@vobjects = VoObject.find_by_contents(search_input,:filter =>
     filter, :sort => ["section", "sale_category"])

Uness you have more than one query in the boolean query, you should probably
just skip that entirely and build your filter from the PhraseQuery.

On 7/14/06, BlueJay <clare.cavanagh at btclick.com>
wrote:>
> Jeremy Bensley wrote:
> > It''s hard to know for sure without seeing how your index is
built, but
> > if
> > you are using TOKENIZED on that field, then whenever the index is
built
> > the
> > text is split on whitespace, and each element is added as a separate
> > term.
>
> Jeremy
>
> Thanks for the reply. I am building the index like this...
>
> class VoObject < ActiveRecord::Base
>   acts_as_ferret :fields=>
>
[''short_description'',''section'',''sale_category'',''sale_type'',''outcode'']
>
> > It looks like when you are searching, you are trying to find the
entire
> > text
> > as a single term.
> >
> > In order to solve this, I believe you can either construct your query
> > using
> > QueryParser, which will use the analyzer / tokenizer and split the
terms
> > out
> > for you, or you can simply split the ''section''
string on whitespace and
> > build a Term and TermQuery for each resulting element and build a
> > PhraseQuery from that set.
>
> Sorry for asking a silly question but how would I go about doing this?
>
> > I hope this is some help,
> >
> > Jeremy
>
>
>
>
> --
> Posted via http://www.ruby-forum.com/.
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://rubyforge.org/pipermail/ferret-talk/attachments/20060714/4a6ffed9/attachment.html

Maybe Matching Threads

Search for more seemingly similar threads

Ferret talk - Jul 2006 - Whitespace Issues

[Ferret-talk] Whitespace Issues

[Ferret-talk] Whitespace Issues

[Ferret-talk] Whitespace Issues

[Ferret-talk] Whitespace Issues

Maybe Matching Threads