Hi all, I''m trying to figure out how to add a filter into a search. I''ve created the filter, basically copying the location filter from http://blog.tourb.us/archives/ferret-and-location-based-searches. But when I try to call Index.search and pass the filter in a hash with the key :filter, I get back that it is expecting type Data, and so I''m at a loss to figure out what to check next. Any help would be greatly appreciated. I''m sure I have a lot to learn, but some nudges in the right direction would be wonderful. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/15/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> Hi all, > I''m trying to figure out how to add a filter into a search. I''ve > created the filter, basically copying the location filter from > http://blog.tourb.us/archives/ferret-and-location-based-searches. But > when I try to call Index.search and pass the filter in a hash with the > key :filter, I get back that it is expecting type Data, and so I''m at > a loss to figure out what to check next. Any help would be greatly > appreciated. I''m sure I have a lot to learn, but some nudges in the > right direction would be wonderful.Hi Jordan, This is a bug which needs to be fixed. Please wait for the next version of Ferret. Or you could use the pure ruby version. Cheers, Dave
On 7/15/06, David Balmain <dbalmain.ml at gmail.com> wrote:> Hi Jordan, > This is a bug which needs to be fixed. Please wait for the next > version of Ferret. Or you could use the pure ruby version. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >Oh, really...darn, it was kind of important. How do I force it to use the pure ruby version? How long until the next version? Is it a complicated fix or is it fixed in a version that I could access (SVN or something)? -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/16/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/15/06, David Balmain <dbalmain.ml at gmail.com> wrote: > > Hi Jordan, > > This is a bug which needs to be fixed. Please wait for the next > > version of Ferret. Or you could use the pure ruby version. > > > > Cheers, > > Dave > > _______________________________________________ > > Ferret-talk mailing list > > Ferret-talk at rubyforge.org > > http://rubyforge.org/mailman/listinfo/ferret-talk > > > > Oh, really...darn, it was kind of important. How do I force it to use > the pure ruby version? How long until the next version? Is it a > complicated fix or is it fixed in a version that I could access (SVN > or something)?To force it to use the pure ruby version require ''rferret'' instead of ''ferret''. Alternatively (I should have mentioned this the first time) you can use a QueryFilter. For example; filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) You should be able to build pretty much any filter you need just like that. Hope that helps. Cheers, Dave PS: The fix can''t be checked out of svn yet. I still have a lot of work to do. Sorry.
On 7/15/06, David Balmain <dbalmain.ml at gmail.com> wrote:> To force it to use the pure ruby version require ''rferret'' instead of > ''ferret''. Alternatively (I should have mentioned this the first time) > you can use a QueryFilter. For example; > > filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) > > You should be able to build pretty much any filter you need just like > that. Hope that helps. > > Cheers, > Dave > > PS: The fix can''t be checked out of svn yet. I still have a lot of > work to do. Sorry. > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >Don''t apologize man, you''ve done an exceptional job with it so far. The filter I was trying to add would filter based on location, so I''m not sure that It could be done easily using a query-filter. It takes a latitude, longitude, and radius, then filters for records that are with the radius...think that''s doable with the builtin filters? I guess I could do it with a bounding box instead, but I''d prefer to keep it accurate...Anyways, I''ll try the rferret route for now, and hopefully by the time this application goes to production, the c version will be fixed up. Thanks for your help. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/16/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/15/06, David Balmain <dbalmain.ml at gmail.com> wrote: > > To force it to use the pure ruby version require ''rferret'' instead of > > ''ferret''. Alternatively (I should have mentioned this the first time) > > you can use a QueryFilter. For example; > > > > filter = QueryFilter.new(TermQuery.new(Term.new("subject", "sport"))) > > > > You should be able to build pretty much any filter you need just like > > that. Hope that helps. > > > > Cheers, > > Dave > > > > PS: The fix can''t be checked out of svn yet. I still have a lot of > > work to do. Sorry. > Don''t apologize man, you''ve done an exceptional job with it so far. > The filter I was trying to add would filter based on location, so I''m > not sure that It could be done easily using a query-filter. It takes a > latitude, longitude, and radius, then filters for records that are > with the radius...think that''s doable with the builtin filters? I > guess I could do it with a bounding box instead, but I''d prefer to > keep it accurate...Anyways, I''ll try the rferret route for now, and > hopefully by the time this application goes to production, the c > version will be fixed up. Thanks for your help.That is a perfect example of what you can''t use the QueryFilter for. I may even use it as an example in the documentation. Thanks and good luck with the pure Ruby version. Cheers, Dave
On 7/16/06, David Balmain <dbalmain.ml at gmail.com> wrote:> > Don''t apologize man, you''ve done an exceptional job with it so far. > > The filter I was trying to add would filter based on location, so I''m > > not sure that It could be done easily using a query-filter. It takes a > > latitude, longitude, and radius, then filters for records that are > > with the radius...think that''s doable with the builtin filters? I > > guess I could do it with a bounding box instead, but I''d prefer to > > keep it accurate...Anyways, I''ll try the rferret route for now, and > > hopefully by the time this application goes to production, the c > > version will be fixed up. Thanks for your help. > > That is a perfect example of what you can''t use the QueryFilter for. I > may even use it as an example in the documentation. Thanks and good > luck with the pure Ruby version. > > Cheers, > Dave > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >I tried out the pure ruby version, but I''m having a little bit of trouble wrapping my head around how to write the filter. It seems that some stuff has changed internally since the sample code that I found at tourb.us was written. I tried looking at the RangeFilter code, but it seems to be solving too different a problem to really be useful as a guide. Do you know of any other filters, or have any pointers to how I would go about writing this filter? It seems really simple, just does a calculation on two of the fields, but because it''s not iterating through terms, the RangeFilter code doesn''t offer me much help. If you offer some pointers and I manage to get it working, I''d be happy to send you a copy to use as a sample, though it seems like the kind of thing you''d probably be able to write in a few minutes... -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/17/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/16/06, David Balmain <dbalmain.ml at gmail.com> wrote: > > <snip> > > > > That is a perfect example of what you can''t use the QueryFilter for. I > > may even use it as an example in the documentation. Thanks and good > > luck with the pure Ruby version. > > I tried out the pure ruby version, but I''m having a little bit of > trouble wrapping my head around how to write the filter. It seems that > some stuff has changed internally since the sample code that I found > at tourb.us was written. I tried looking at the RangeFilter code, but > it seems to be solving too different a problem to really be useful as > a guide. Do you know of any other filters, or have any pointers to how > I would go about writing this filter? It seems really simple, just > does a calculation on two of the fields, but because it''s not > iterating through terms, the RangeFilter code doesn''t offer me much > help. If you offer some pointers and I manage to get it working, I''d > be happy to send you a copy to use as a sample, though it seems like > the kind of thing you''d probably be able to write in a few minutes...I think I slightly misunderstood your problem the first time around. To create this filter, you actually have to iterate through every document in the index. This will take some time but it would be worth it if the filter gets used many times, since it gets cached. However, I don''t think this would work for you because I''m guessing the longitude, latitude and radius change on a query by query basis. This is not really what the current filters are designed for. Filters should be common query restrictions that are run over and over again. For example, a blog may have a month filter for retrieving documents from a particular month. This is likely to be used over and over again and RangeFilters are pretty cheap to build. So the current solution to your problem is to actually post-filter your query results yourself (ie filter the results once you have them back). So let''s say you need ten results. You''d do a search for maybe 50 and run through each result checking the distance and discarding the ones you don''t need. You''d repeat the search until you found enough documents. Here is a quick and dirty solution (where num_docs is the number of documents you want in your resultset); def search(index, query, num_docs, latitude, longitude, radius) first_doc = 0 results = [] while true count = index.search_each(query, :first_doc => first_doc, :num_docs => num_docs*5) do |doc_id, score| doc = index[doc_id] # test distance and add to resultset if ok if ((doc[:latitude] - latitude) ** 2 + (doc[:longitude] - longitude) ** 2) < radius ** 2 results << doc end break if results.size == num_docs # have enough docs end break if count < (num_docs * 5) #already scanned all results first_doc += num_docs * 5 end return results end This gets even messier when you need to page through the results. A much nicer solution that this would be to add a :filter_proc to the search methods. Something like this; within_radius = lambda do |doc| return ((doc[:latitude] - latitude) ** 2 + (doc[:longitude] - longitude) ** 2) < (radius ** 2) end index.search_each(query, :filter_proc => within_radius) {|d, s| ...} Does this sound like a good idea? If so I could add it to a future version of Ferret. Please let me know if you can think of a better way to do this. Cheers, Dave
I for one think this custom filter would be an awesome addition. Geospatial and local search is a hot area and it would be cool if ferret facilitated this type of query easily. Would it be a significant performance hit if ferret has to cycle through every document for this search? Fine over a couple of hundred, or thousand? but hundreds of thousands? Just tossing around the idea but... This particular search (distance) can be done quite efficiently with sql. Is it at all feasible that you could ''outsource'' the query to sql? Obviously sql could return the id''s simply enough, but i guess then you''d need to go through each document anyway... To return a bitset, would the database need to know about the ferret document order? Or how about the reverse, use ferret to create a list of ids to pass into a sql IN query? Afraid I have no idea how efficient that would be either... Anyone in here have a best practice?> ---------- Forwarded message ---------- > From: "David Balmain" <dbalmain.ml at gmail.com> > To: ferret-talk at rubyforge.org > Date: Mon, 17 Jul 2006 10:54:45 +0900 > Subject: Re: [Ferret-talk] adding a custom filter to the query > On 7/17/06, Jordan Frank <jordan.w.frank at gmail.com> wrote: > > On 7/16/06, David Balmain <dbalmain.ml at gmail.com> wrote: > > > <snip> > > > > > > That is a perfect example of what you can''t use the QueryFilter for. I > > > may even use it as an example in the documentation. Thanks and good > > > luck with the pure Ruby version. > > > > I tried out the pure ruby version, but I''m having a little bit of > > trouble wrapping my head around how to write the filter. It seems that > > some stuff has changed internally since the sample code that I found > > at tourb.us was written. I tried looking at the RangeFilter code, but > > it seems to be solving too different a problem to really be useful as > > a guide. Do you know of any other filters, or have any pointers to how > > I would go about writing this filter? It seems really simple, just > > does a calculation on two of the fields, but because it''s not > > iterating through terms, the RangeFilter code doesn''t offer me much > > help. If you offer some pointers and I manage to get it working, I''d > > be happy to send you a copy to use as a sample, though it seems like > > the kind of thing you''d probably be able to write in a few minutes... > > I think I slightly misunderstood your problem the first time around. > To create this filter, you actually have to iterate through every > document in the index. This will take some time but it would be worth > it if the filter gets used many times, since it gets cached. However, > I don''t think this would work for you because I''m guessing the > longitude, latitude and radius change on a query by query basis. This > is not really what the current filters are designed for. Filters > should be common query restrictions that are run over and over again. > For example, a blog may have a month filter for retrieving documents > from a particular month. This is likely to be used over and over again > and RangeFilters are pretty cheap to build. > > So the current solution to your problem is to actually post-filter > your query results yourself (ie filter the results once you have them > back). So let''s say you need ten results. You''d do a search for maybe > 50 and run through each result checking the distance and discarding > the ones you don''t need. You''d repeat the search until you found > enough documents. Here is a quick and dirty solution (where num_docs > is the number of documents you want in your resultset); > > def search(index, query, num_docs, latitude, longitude, radius) > first_doc = 0 > results = [] > while true > count = index.search_each(query, > :first_doc => first_doc, > :num_docs => num_docs*5) do |doc_id, score| > doc = index[doc_id] > # test distance and add to resultset if ok > if ((doc[:latitude] - latitude) ** 2 + > (doc[:longitude] - longitude) ** 2) < radius ** 2 > results << doc > end > break if results.size == num_docs # have enough docs > end > break if count < (num_docs * 5) #already scanned all results > first_doc += num_docs * 5 > end > return results > end > > > This gets even messier when you need to page through the results. A > much nicer solution that this would be to add a :filter_proc to the > search methods. Something like this; > > within_radius = lambda do |doc| > return ((doc[:latitude] - latitude) ** 2 + > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > end > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > Does this sound like a good idea? If so I could add it to a future > version of Ferret. Please let me know if you can think of a better way > to do this. > > Cheers, > Dave
Comments inline... On 7/17/06, Sam Giffney <samuelgiffney at gmail.com> wrote:> I for one think this custom filter would be an awesome addition. > Geospatial and local search is a hot area and it would be cool if > ferret facilitated this type of query easily.Agreed. Though it does seem like an abuse of the search engine. The search engine''s goal is to retrieve as few documents as possible to satisfy the query, as far as I can tell anyways. David is right, and performing a calculation on every document makes less and less sense the more I think about it.> [...] > Just tossing around the idea but... > This particular search (distance) can be done quite efficiently with > sql. Is it at all feasible that you could ''outsource'' the query to > sql? Obviously sql could return the id''s simply enough, but i guess > then you''d need to go through each document anyway... To return a > bitset, would the database need to know about the ferret document > order? > > Or how about the reverse, use ferret to create a list of ids to pass > into a sql IN query? Afraid I have no idea how efficient that would be > either...This is exactly how I''m doing it now, but the problem is that the data I''m using is so spread out location-wise that sometimes I only get 40-50 good hits for every 1,000 entries returned from ferret. And so I find myself going back to ferret to retrieve more results a few times for each query, when I need to return 100 results that are within a certain distance. This is obviously inefficient. Obviously I could just pull more results out of ferret in the first place, but most of the time 1,000 is more than enough to get 100 good results. Obviously testing will let me find the optimal number to pull from ferret, but I figured that if I could put the distance calculation into ferret itself, then I could ask for 100 results, and get 100 results every time.> Anyone in here have a best practice?I would like to know if anyone else has tackled this as well, and has some tips as well.> > ---------- Forwarded message ---------- > > From: "David Balmain" <dbalmain.ml at gmail.com> > > <snip> > > This gets even messier when you need to page through the results. A > > much nicer solution that this would be to add a :filter_proc to the > > search methods. Something like this; > > > > within_radius = lambda do |doc| > > return ((doc[:latitude] - latitude) ** 2 + > > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > > end > > > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > > > Does this sound like a good idea? If so I could add it to a future > > version of Ferret. Please let me know if you can think of a better way > > to do this. > > <snip>This is how I''m doing it now. I guess adding the filter_proc would clean up my code a bit, and simplify the paging etc. My question would be how you''d handle the problem that I mentioned earlier, that is how to determine how many documents to retrieve before the filter_proc is evaluated in order to eventually return the desired number of documents. I don''t know enough about the internals of ferret to know if I''m bringing up a valid point, but I''m guessing that if I only request the top 5 documents for a query, it doesn''t retrieve every single document that satisfies the query and then take the top 5 from that list. Maybe it does though, as I said, I don''t know enough about the internals of ferret, though I''d like to... So if the problem that I bring up is legitimate, then the problem would be in coming up with some sort of heuristic based on how many documents are expected to satisfy the filter_proc. If only 10% of the documents satisfy the filter_proc, then to get the top 5 documents matching a query, we''d want to retrieve the top 50 documents internally, then pass them through the filter_proc, and hopefully we''d be left with at least 5 to return. For my specific application, I''m in a better position to determine this hit percentage, and so I''m in a better position to do the filtering. I don''t know whether doing this in ferret would be efficient or even feasible. Anyways, let me know what your thoughts are on this. The filter_proc idea is a good one, as long as it can be implemented efficiently. Otherwise I''ll just keep using my two phase method, retrieve the documents from ferret, and then do the location filtering in SQL. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/17/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> <snip> > > > From: "David Balmain" <dbalmain.ml at gmail.com> > > > <snip> > > > This gets even messier when you need to page through the results. A > > > much nicer solution that this would be to add a :filter_proc to the > > > search methods. Something like this; > > > > > > within_radius = lambda do |doc| > > > return ((doc[:latitude] - latitude) ** 2 + > > > (doc[:longitude] - longitude) ** 2) < (radius ** 2) > > > end > > > > > > index.search_each(query, :filter_proc => within_radius) {|d, s| ...} > > > > > > Does this sound like a good idea? If so I could add it to a future > > > version of Ferret. Please let me know if you can think of a better way > > > to do this. > > > <snip> > > This is how I''m doing it now. I guess adding the filter_proc would > clean up my code a bit, and simplify the paging etc. My question would > be how you''d handle the problem that I mentioned earlier, that is how > to determine how many documents to retrieve before the filter_proc is > evaluated in order to eventually return the desired number of > documents. I don''t know enough about the internals of ferret to know > if I''m bringing up a valid point, but I''m guessing that if I only > request the top 5 documents for a query, it doesn''t retrieve every > single document that satisfies the query and then take the top 5 from > that list. Maybe it does though, as I said, I don''t know enough about > the internals of ferret, though I''d like to...Ferret actually has to check the score of every singly document in the index that matches the query. It keeps a priority queue of as many documents as it needs to return the result set. So if :num_docs is 50, and :first_doc is 200 Ferret will need to keep a priority queue of 250 documents.> So if the problem that I bring up is legitimate, then the problem > would be in coming up with some sort of heuristic based on how many > documents are expected to satisfy the filter_proc. If only 10% of the > documents satisfy the filter_proc, then to get the top 5 documents > matching a query, we''d want to retrieve the top 50 documents > internally, then pass them through the filter_proc, and hopefully we''d > be left with at least 5 to return. For my specific application, I''m in > a better position to determine this hit percentage, and so I''m in a > better position to do the filtering. I don''t know whether doing this > in ferret would be efficient or even feasible.You wouldn''t need to request more documents than you need using the :filter_proc idea. You''d just specify :num_docs as usual and you''d get :num_docs back. So if you want 50 documents you''d get 50 documents (or less if fewer documents matched the query and distance constraint).> Anyways, let me know what your thoughts are on this. The filter_proc > idea is a good one, as long as it can be implemented efficiently. > Otherwise I''ll just keep using my two phase method, retrieve the > documents from ferret, and then do the location filtering in SQL.The proc would just be called once for every matching document in the result set, not every document. It shouldn''t be too expensive at all and probably a lot more efficient than filtering using the SQL method. Cheers, Dave
On 7/17/06, David Balmain <dbalmain.ml at gmail.com> wrote:> Ferret actually has to check the score of every singly document in the > index that matches the query. It keeps a priority queue of as many > documents as it needs to return the result set. So if :num_docs is 50, > and :first_doc is 200 Ferret will need to keep a priority queue of 250 > documents. > <snip> > The proc would just be called once for every matching document in the > result set, not every document. It shouldn''t be too expensive at all > and probably a lot more efficient than filtering using the SQL method. > <snip>If that''s the case, then I think the filter_proc idea would be fantastic, and I''d love to see it make it''s way into a future version. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
This is a "me too" post. I would love to replace the query filter we use on tourb.us with this. gary On 7/17/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/17/06, David Balmain <dbalmain.ml at gmail.com> wrote: > > Ferret actually has to check the score of every singly document in the > > index that matches the query. It keeps a priority queue of as many > > documents as it needs to return the result set. So if :num_docs is 50, > > and :first_doc is 200 Ferret will need to keep a priority queue of 250 > > documents. > > <snip> > > The proc would just be called once for every matching document in the > > result set, not every document. It shouldn''t be too expensive at all > > and probably a lot more efficient than filtering using the SQL method. > > <snip> > > If that''s the case, then I think the filter_proc idea would be > fantastic, and I''d love to see it make it''s way into a future version. > > -- > Cheers, > Jordan Frank > jordan.w.frank at gmail.com > _______________________________________________ > Ferret-talk mailing list > Ferret-talk at rubyforge.org > http://rubyforge.org/mailman/listinfo/ferret-talk >
On 7/17/06, Gary Elliott <garypelliott at gmail.com> wrote:> This is a "me too" post. I would love to replace the query filter we > use on tourb.us with this. > > gary >Maybe Gary, or someone else can help me, but I''ve put the query filter problem aside, and I''m trying to do this by finding locations within a bounding box using Range queries on the longitude and latitude. Unfortunately I''m running into some problems, since I''m comparing numeric values that can be positive or negative, and as far as I can tell, Ferret (actually I''ve only been able to find information about Lucene, but I''m assuming it''s the same) does the comparisons lexicographically, and not numerically. So I''ve tried to replicate the encoding as they do in http://wiki.apache.org/jakarta-lucene/SearchNumericalFields, but I''m encountering some strange behaviour that is throwing me off. So I index a bunch of documents, and see the following line of output: Adding field latitude_string with value ''004915010'' to index So that is the encoded version of 49.1501. Now if I do the following query, I should get this record back:>> Person.ferret_index.search_each("latitude_string:[''000000000'' ''099999999'']")=> 0 But I don''t, and I can verify that lexicographically, ruby sees ''004915010'' as lying between ''000000000'' and ''099999999'':>> ''000000000'' <= ''004915010'' and ''004915010'' <= ''099999999''=> true But the query returns no results. I''ve tried a few more, as follows:>> Person.ferret_index.search_each("latitude_string:(> ''000000000'')") do end=> 7>> Person.ferret_index.search_each("latitude_string:(< ''099900000'')") do end=> 0 And so clearly it is not seeing that ''004915010'' < ''099999999''. If I remove the quotes, it works properly, but the problem is then with the negative values.>> Person.ferret_index.search_each("latitude_string:(> -00000000)") do end=> 0>> Person.ferret_index.search_each("latitude_string:(> ''-00000000'')") do end=> 7 So the quotes affect things, but then what if I need to search between a negative value and a positive value.>> Person.ferret_index.search_each("latitude_string:(< 099999999)") do end=> 7>> Person.ferret_index.search_each("longitude_string:(> ''-00000000'')") do end=> 7>> Person.ferret_index.search_each("latitude_string:[''-00000000''099999999]") do end => 0 For now should I just not be using range queries at all, and just quote negative values? I''d have to do more testing to see if it''s accurate, but it seems to be the only way that works...maybe I could make all values positive by adding a constant to them all? Any ideas why this is occuring? Am I doing this completely backwards, is there an easier way to do the numeric comparisons? I''m very sorry if this is an issue that has been discussed before, but I did look through the archives and didn''t find anything... -- Cheers, Jordan Frank jordan.w.frank at gmail.com
Jean-Etienne Durand
2006-Jul-18 19:09 UTC
[Ferret-talk] adding a custom filter to the query
Jordan, Why not using NumberTools::long_to_s to convert your numeric values (indexing & search) ? Jean-Etienne Jordan Frank wrote:> On 7/17/06, Gary Elliott <garypelliott at gmail.com> wrote: >> This is a "me too" post. I would love to replace the query filter we >> use on tourb.us with this. >> >> gary >> > > Maybe Gary, or someone else can help me, but I''ve put the query filter > problem aside, and I''m trying to do this by finding locations within a > bounding box using Range queries on the longitude and latitude. > Unfortunately I''m running into some problems, since I''m comparing > numeric values that can be positive or negative, and as far as I can > tell, Ferret (actually I''ve only been able to find information about > Lucene, but I''m assuming it''s the same) does the comparisons > lexicographically, and not numerically. > > So I''ve tried to replicate the encoding as they do in > http://wiki.apache.org/jakarta-lucene/SearchNumericalFields, but I''m > encountering some strange behaviour that is throwing me off. > > So I index a bunch of documents, and see the following line of output: > Adding field latitude_string with value ''004915010'' to index > So that is the encoded version of 49.1501. > Now if I do the following query, I should get this record back: >>> Person.ferret_index.search_each("latitude_string:[''000000000'' ''099999999'']") > => 0 > > But I don''t, and I can verify that lexicographically, ruby sees > ''004915010'' as lying between ''000000000'' and ''099999999'': >>> ''000000000'' <= ''004915010'' and ''004915010'' <= ''099999999'' > => true > > But the query returns no results. I''ve tried a few more, as follows: >>> Person.ferret_index.search_each("latitude_string:(> ''000000000'')") do end > => 7 >>> Person.ferret_index.search_each("latitude_string:(< ''099900000'')") do end > => 0 > > And so clearly it is not seeing that ''004915010'' < ''099999999''. If I > remove the quotes, it works properly, but the problem is then with the > negative values. >>> Person.ferret_index.search_each("latitude_string:(> -00000000)") do end > => 0 >>> Person.ferret_index.search_each("latitude_string:(> ''-00000000'')") do end > => 7 > > So the quotes affect things, but then what if I need to search between > a negative value and a positive value. >>> Person.ferret_index.search_each("latitude_string:(< 099999999)") do end > => 7 >>> Person.ferret_index.search_each("longitude_string:(> ''-00000000'')") do end > => 7 >>> Person.ferret_index.search_each("latitude_string:[''-00000000'' > 099999999]") do end > => 0 > > For now should I just not be using range queries at all, and just > quote negative values? I''d have to do more testing to see if it''s > accurate, but it seems to be the only way that works...maybe I could > make all values positive by adding a constant to them all? > > Any ideas why this is occuring? Am I doing this completely backwards, > is there an easier way to do the numeric comparisons? I''m very sorry > if this is an issue that has been discussed before, but I did look > through the archives and didn''t find anything... >
On 7/18/06, Jean-Etienne Durand <etienne.durand at mail.com> wrote:> Jordan, > > Why not using NumberTools::long_to_s to convert your numeric values > (indexing & search) ? > > Jean-Etienne >Well, because I am a fool, and did not notice this class that seems to be exactly what I need. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/18/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> > Well, because I am a fool, and did not notice this class that seems to > be exactly what I need. > > -- > Cheers, > Jordan Frank > jordan.w.frank at gmail.com >Actually, I spoke too soon. It appears that this class has the same problem with negative numbers. For example:>> Person.search("latitude_string:[00000000000000 0000000000nesr]").length=> 7>> Person.search("latitude_string:[-1y2p0ij321x6p 0000000000nesr]").length=> 0 I''ve expanded my range, so shouldn''t the number of results be at least what it was with all 0''s? I''ve tried with quotes too, and it doesn''t help. Again though, if I do the following (note the quotes):>> Person.search("latitude_string:(> ''-1y2p0ij321x6p'' AND <0000000000nesr").length => 7 It works... So what i''ve done, is because I''m only working with longitudes and latitudes, which are guaranteed to lie between -500 and 500, I''m just adding 500 to them, to make them all positive, and then I can use the range queries...and I wrote my own little number to string thing, since I''m working with small values. But nevertheless I thank you for your help. -- Cheers, Jordan Frank jordan.w.frank at gmail.com
On 7/19/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/17/06, Gary Elliott <garypelliott at gmail.com> wrote:> <snip/> > So I index a bunch of documents, and see the following line of output: > Adding field latitude_string with value ''004915010'' to index > So that is the encoded version of 49.1501. > Now if I do the following query, I should get this record back: > >> Person.ferret_index.search_each("latitude_string:[''000000000'' ''099999999'']") > => 0irb(main):008:0> index.search("latitude:[000000000 099999999]").size => 1 irb(main):009:0> index.search("latitude:[''000000000'' ''099999999'']").size => 0 The quotes are getting tokenized with the terms so the problem is that "''0099999999''" <= ''004915010'' Perhaps you already worked that out. Dave
On 7/19/06, Jordan Frank <jordan.w.frank at gmail.com> wrote:> On 7/18/06, Jordan Frank <jordan.w.frank at gmail.com> wrote: > > > > Well, because I am a fool, and did not notice this class that seems to > > be exactly what I need. > > > > -- > > Cheers, > > Jordan Frank > > jordan.w.frank at gmail.com > > > > Actually, I spoke too soon. It appears that this class has the same > problem with negative numbers. For example: > > >> Person.search("latitude_string:[00000000000000 0000000000nesr]").length > => 7 > >> Person.search("latitude_string:[-1y2p0ij321x6p 0000000000nesr]").length > => 0 > > I''ve expanded my range, so shouldn''t the number of results be at least > what it was with all 0''s? I''ve tried with quotes too, and it doesn''t > help. Again though, if I do the following (note the quotes): > > >> Person.search("latitude_string:(> ''-1y2p0ij321x6p'' AND < > 0000000000nesr").length > => 7 > > It works... > > So what i''ve done, is because I''m only working with longitudes and > latitudes, which are guaranteed to lie between -500 and 500, I''m just > adding 500 to them, to make them all positive, and then I can use the > range queries...and I wrote my own little number to string thing, > since I''m working with small values. But nevertheless I thank you for > your help.This seems like the best solution at the moment. I''d forgotten about NumTools. It''s probably one of the first modules I ever wrote in Ruby. Anyway, it looks like it might need an upgrade. I''ll try and fix it so that it can handle negative numbers. In C this would be a no-brainer but Ruby''s BigNums make it a little difficult. I might put the challenge to the Ruby mailing list. Cheers, Dave