thr3ads.net - Ferret talk - [Ferret-talk] Questions about Searching [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Tom Davies

2006-Jan-20 13:39 UTC

[Ferret-talk] Questions about Searching

Hi,

I have some questions about searching with Ferret.  I have a user
index with first_name, last_name and full_name (which is just first
plus last with a space).

Here are a couple of questions:

1) If I store the fields tokenized, it appears as though queries are
case-insensitive.  However, for untokenized, the query is
case-sensitive.  How can I make the untokenized searches
case-insensitive?

2) If I have a field with whitespace in it, how can I search for the
whitespace using wildcard searches.  For instance, if the full_name I
am searching for is "John Doe", how can I build a query for that.  I
have tried numerous combinations, here are a couple I tried:
  full_name:"#{query}"*  <-- This will match every field in the
index
  full_name:"#{query}*" <-- This matches nothing

3) When I store the fields as untokenized, exact matches seem to not
work for me anymore.  For instance, this query worked for tokenized
first_name, but does not for untokenized first_name:
  first_name:John

But this query will return results:
  first_name:Joh?

4) Is there a better way to search for the first and last name
combination that storing another index with them concatenated?

Thanks,

Tom

Erik Hatcher

2006-Jan-20 15:34 UTC

head link

[Ferret-talk] Questions about Searching

On Jan 20, 2006, at 8:39 AM, Tom Davies wrote:> Here are a couple of questions:
>
> 1) If I store the fields tokenized, it appears as though queries are
> case-insensitive.  However, for untokenized, the query is
> case-sensitive.  How can I make the untokenized searches
> case-insensitive?
By lowercasing the text you index and lowercasing the text in the  
query.  Search matches are case sensitive always, but generally  
tokenized fields get lowercased along the way, and the query parser  
lowercases terms also (generally by the same analyzer).
> 2) If I have a field with whitespace in it, how can I search for the
> whitespace using wildcard searches.  For instance, if the full_name I
> am searching for is "John Doe", how can I build a query for that.
I
> have tried numerous combinations, here are a couple I tried:
>   full_name:"#{query}"*  <-- This will match every field in
the index
>   full_name:"#{query}*" <-- This matches nothing
I strongly suspect the issue is the field being analyzed during query  
parsing.  I''m not sure what facilities Ferret has for doing this  
exactly off the top of my head, but in Java Lucene there is a  
PerFieldAnalyzerWrapper that helps with this.  The space would be  
problematic, as well as the double quotes in how you have created  
it.  You may need to create a WildcardQuery via the API rather than  
using the parser.
> 3) When I store the fields as untokenized, exact matches seem to not
> work for me anymore.  For instance, this query worked for tokenized
> first_name, but does not for untokenized first_name:
>   first_name:John
>
> But this query will return results:
>   first_name:Joh?
This again has to do with the case and analyzer issue.  You are   
using a parser that does analysis of the text.  Try using the parser  
to create a Query and see what it consists of (.to_s?).
> 4) Is there a better way to search for the first and last name
> combination that storing another index with them concatenated?
It really all depends on what your searching needs are.  What does  
the user interface for searching demand?

	Erik

Tom Davies

2006-Jan-20 15:56 UTC

head link

[Ferret-talk] Questions about Searching

Thanks Erik.  Very informative.  I suspect the QueryParser either has
some bugs or is not designed to handle this scenario.  I will try
manually building the specific types of queries via the API.
> It really all depends on what your searching needs are.  What does
> the user interface for searching demand?
For the full name searches, I just wanted wild card matches on the
right hand side of the query.  For instance, any of these should
result in john doe being found:
  J, Jo, Joh, John, John D, etc.

Tom

Erik Hatcher

2006-Jan-20 18:15 UTC

head link

[Ferret-talk] Questions about Searching

On Jan 20, 2006, at 10:56 AM, Tom Davies wrote:> Thanks Erik.  Very informative.  I suspect the QueryParser either has
> some bugs or is not designed to handle this scenario.  I will try
> manually building the specific types of queries via the API.
There are many tricky scenarios because of the necessity for  
whitespace and special characters to be handled as separators and  
operators and the analyzer (and when it is used) with the query parser.

So no bugs, per se, I don''t think in this case.

My article at java.net covers this (in the context of Java) in some  
of its glory and frustration I think:

	<http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html>
>> It really all depends on what your searching needs are.  What does
>> the user interface for searching demand?
>
> For the full name searches, I just wanted wild card matches on the
> right hand side of the query.  For instance, any of these should
> result in john doe being found:
>   J, Jo, Joh, John, John D, etc.
The simplest thing to do in this case is what you''re doing for  
indexing... combine a field with "firstname lastname" as untokenized,
though lowercased.  Then build a WildcardQuery for "piece*" - though  
this isn''t going to be possible with the whitespace involved when  
using the parser, I don''t think (unless you can escape it somehow).   
Be sure to lowercase the query also.

	Erik

Tom Davies

2006-Jan-24 13:05 UTC

head link

[Ferret-talk] Questions about Searching

Thanks Erik.  Nice article.  I was able to get the wildcard search to
work including whitespace by manually creating the query as follows:

    qp = Ferret::QueryParser.new
    query = qp.get_wild_query(''full_name'',
"#{partial}*")
    INDEX.search_each(query) do |doc, score|

where #{partial} is the partial portion of the full name.

Thanks for your responses.

Tom

On 1/20/06, Erik Hatcher <erik at ehatchersolutions.com>
wrote:>
> On Jan 20, 2006, at 10:56 AM, Tom Davies wrote:
> > Thanks Erik.  Very informative.  I suspect the QueryParser either has
> > some bugs or is not designed to handle this scenario.  I will try
> > manually building the specific types of queries via the API.
>
> There are many tricky scenarios because of the necessity for
> whitespace and special characters to be handled as separators and
> operators and the analyzer (and when it is used) with the query parser.
>
> So no bugs, per se, I don''t think in this case.
>
> My article at java.net covers this (in the context of Java) in some
> of its glory and frustration I think:
>
>        
<http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html>
>
> >> It really all depends on what your searching needs are.  What does
> >> the user interface for searching demand?
> >
> > For the full name searches, I just wanted wild card matches on the
> > right hand side of the query.  For instance, any of these should
> > result in john doe being found:
> >   J, Jo, Joh, John, John D, etc.
>
> The simplest thing to do in this case is what you''re doing for
> indexing... combine a field with "firstname lastname" as
untokenized,
> though lowercased.  Then build a WildcardQuery for "piece*" -
though
> this isn''t going to be possible with the whitespace involved when
> using the parser, I don''t think (unless you can escape it
somehow).
> Be sure to lowercase the query also.
>
>         Erik
>
> _______________________________________________
> Ferret-talk mailing list
> Ferret-talk at rubyforge.org
> http://rubyforge.org/mailman/listinfo/ferret-talk
>

Maybe Matching Threads

Search for more possibly parallel threads

Ferret talk - Jan 2006 - Questions about Searching

[Ferret-talk] Questions about Searching

[Ferret-talk] Questions about Searching

[Ferret-talk] Questions about Searching

[Ferret-talk] Questions about Searching

[Ferret-talk] Questions about Searching

Maybe Matching Threads