Hi, I have some questions about searching with Ferret. I have a user index with first_name, last_name and full_name (which is just first plus last with a space). Here are a couple of questions: 1) If I store the fields tokenized, it appears as though queries are case-insensitive. However, for untokenized, the query is case-sensitive. How can I make the untokenized searches case-insensitive? 2) If I have a field with whitespace in it, how can I search for the whitespace using wildcard searches. For instance, if the full_name I am searching for is "John Doe", how can I build a query for that. I have tried numerous combinations, here are a couple I tried: full_name:"#{query}"* <-- This will match every field in the index full_name:"#{query}*" <-- This matches nothing 3) When I store the fields as untokenized, exact matches seem to not work for me anymore. For instance, this query worked for tokenized first_name, but does not for untokenized first_name: first_name:John But this query will return results: first_name:Joh? 4) Is there a better way to search for the first and last name combination that storing another index with them concatenated? Thanks, Tom
On Jan 20, 2006, at 8:39 AM, Tom Davies wrote:> Here are a couple of questions: > > 1) If I store the fields tokenized, it appears as though queries are > case-insensitive. However, for untokenized, the query is > case-sensitive. How can I make the untokenized searches > case-insensitive?By lowercasing the text you index and lowercasing the text in the query. Search matches are case sensitive always, but generally tokenized fields get lowercased along the way, and the query parser lowercases terms also (generally by the same analyzer).> 2) If I have a field with whitespace in it, how can I search for the > whitespace using wildcard searches. For instance, if the full_name I > am searching for is "John Doe", how can I build a query for that. I > have tried numerous combinations, here are a couple I tried: > full_name:"#{query}"* <-- This will match every field in the index > full_name:"#{query}*" <-- This matches nothingI strongly suspect the issue is the field being analyzed during query parsing. I''m not sure what facilities Ferret has for doing this exactly off the top of my head, but in Java Lucene there is a PerFieldAnalyzerWrapper that helps with this. The space would be problematic, as well as the double quotes in how you have created it. You may need to create a WildcardQuery via the API rather than using the parser.> 3) When I store the fields as untokenized, exact matches seem to not > work for me anymore. For instance, this query worked for tokenized > first_name, but does not for untokenized first_name: > first_name:John > > But this query will return results: > first_name:Joh?This again has to do with the case and analyzer issue. You are using a parser that does analysis of the text. Try using the parser to create a Query and see what it consists of (.to_s?).> 4) Is there a better way to search for the first and last name > combination that storing another index with them concatenated?It really all depends on what your searching needs are. What does the user interface for searching demand? Erik
Thanks Erik. Very informative. I suspect the QueryParser either has some bugs or is not designed to handle this scenario. I will try manually building the specific types of queries via the API.> It really all depends on what your searching needs are. What does > the user interface for searching demand?For the full name searches, I just wanted wild card matches on the right hand side of the query. For instance, any of these should result in john doe being found: J, Jo, Joh, John, John D, etc. Tom
On Jan 20, 2006, at 10:56 AM, Tom Davies wrote:> Thanks Erik. Very informative. I suspect the QueryParser either has > some bugs or is not designed to handle this scenario. I will try > manually building the specific types of queries via the API.There are many tricky scenarios because of the necessity for whitespace and special characters to be handled as separators and operators and the analyzer (and when it is used) with the query parser. So no bugs, per se, I don''t think in this case. My article at java.net covers this (in the context of Java) in some of its glory and frustration I think: <http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html>>> It really all depends on what your searching needs are. What does >> the user interface for searching demand? > > For the full name searches, I just wanted wild card matches on the > right hand side of the query. For instance, any of these should > result in john doe being found: > J, Jo, Joh, John, John D, etc.The simplest thing to do in this case is what you''re doing for indexing... combine a field with "firstname lastname" as untokenized, though lowercased. Then build a WildcardQuery for "piece*" - though this isn''t going to be possible with the whitespace involved when using the parser, I don''t think (unless you can escape it somehow). Be sure to lowercase the query also. Erik
Thanks Erik. Nice article. I was able to get the wildcard search to work including whitespace by manually creating the query as follows: qp = Ferret::QueryParser.new query = qp.get_wild_query(''full_name'', "#{partial}*") INDEX.search_each(query) do |doc, score| where #{partial} is the partial portion of the full name. Thanks for your responses. Tom On 1/20/06, Erik Hatcher <erik at ehatchersolutions.com> wrote:> > On Jan 20, 2006, at 10:56 AM, Tom Davies wrote: > > Thanks Erik. Very informative. I suspect the QueryParser either has > > some bugs or is not designed to handle this scenario. I will try > > manually building the specific types of queries via the API. > > There are many tricky scenarios because of the necessity for > whitespace and special characters to be handled as separators and > operators and the analyzer (and when it is used) with the query parser. > > So no bugs, per se, I don''t think in this case. > > My article at java.net covers this (in the context of Java) in some > of its glory and frustration I think: > > <http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html> > > >> It really all depends on what your searching needs are. What does > >> the user interface for searching demand? > > > > For the full name searches, I just wanted wild card matches on the > > right hand side of the query. For instance, any of these should > > result in john doe being found: > > J, Jo, Joh, John, John D, etc. > > The simplest thing to do in this case is what you''re doing for > indexing... combine a field with "firstname lastname" as untokenized, > though lowercased. Then build a WildcardQuery for "piece*" - though > this isn''t going to be possible with the whitespace involved when > using the parser, I don''t think (unless you can escape it somehow). > Be sure to lowercase the query also. > > Erik > > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >