Clare
2006-Sep-20 10:36 UTC
[Ferret-talk] Range searches some times they work, some times not...
Hi i''m using ferret to enable geographical postcode. I take a postcode and distance in miles from the user, strip off the outcode and then retrieve the associated x y coordinates in metres from the db. Then i get two temp x''s and y''s and search for all results that are within the box, see code below. Problems start to occur when i search on big distances so for example 40 miles from "G1" VoObject.ferret_index.search(" x:[206826 335573] AND y:[590526 719273]").total_hits => 165 300 miles VoObject.ferret_index.search("y:[172098 1137702]").total_hits Ferret::QueryParser::QueryParseException: Error occured in q_range.c:121 - range_new Upper bound must be greater than lower bound. "1137702" < "172098" from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in `parse'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in `process_query'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:560:in `do_search'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:233:in `search'' from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'' from /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:232:in `search'' from (irb):16 So what am i doing wrong? How have other people used ferret for geographical searches? Is there another way that i can define the range so that it works properly? because I''m also getting other crazy and just plain wrong results VoObject.ferret_index.search("y:[0 9]").total_hits => 167 thats telling me that all the test data is with 8 metres of the origin... thanks in advance. clare if their_outcode && their_outcode.size > 0 temp_hwz = HwzPostcode.find(:first, :conditions => [''outcode = ?'',their_outcode]) range_x_left = temp_hwz.x - (postcode_distance.to_f*1.60934 * 1000) range_x_right = temp_hwz.x + (postcode_distance.to_f*1.60934 * 1000) range_y_top = temp_hwz.y + (postcode_distance.to_f*1.60934 * 1000) range_y_bottom = temp_hwz.y - (postcode_distance.to_f*1.60934 * 1000) query += " AND x:[#{range_x_left.to_i} #{range_x_right.to_i}] AND y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]" end -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-20 12:21 UTC
[Ferret-talk] Range searches some times they work, some times not...
On 9/20/06, Clare <clare.cav at arogent.co.uk> wrote:> Hi i''m using ferret to enable geographical postcode. I take a postcode > and distance in miles from the user, strip off the outcode and then > retrieve the associated x y coordinates in metres from the db. Then i > get two temp x''s and y''s and search for all results that are within the > box, see code below. > > Problems start to occur when i search on big distances so for example > > 40 miles from "G1" > VoObject.ferret_index.search(" x:[206826 335573] AND y:[590526 > 719273]").total_hits > => 165 > > > 300 miles > VoObject.ferret_index.search("y:[172098 1137702]").total_hits > Ferret::QueryParser::QueryParseException: Error occured in q_range.c:121 > - range_new > Upper bound must be greater than lower bound. "1137702" < > "172098" > > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in > `parse'' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:572:in > `process_query'' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:560:in > `do_search'' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:233:in > `search'' > from /usr/lib/ruby/1.8/monitor.rb:229:in `synchronize'' > from > /usr/lib/ruby/gems/1.8/gems/ferret-0.10.1/lib/ferret/index.rb:232:in > `search'' > from (irb):16 > > > So what am i doing wrong? How have other people used ferret for > geographical searches? Is there another way that i can define the range > so that it works properly? > > because I''m also getting other crazy and just plain wrong results > > VoObject.ferret_index.search("y:[0 9]").total_hits > => 167 > > thats telling me that all the test data is with 8 metres of the > origin... > > thanks in advance. > clare > > > if their_outcode && their_outcode.size > 0 > temp_hwz = HwzPostcode.find(:first, :conditions => [''outcode > ?'',their_outcode]) > range_x_left = temp_hwz.x - (postcode_distance.to_f*1.60934 * 1000) > range_x_right = temp_hwz.x + (postcode_distance.to_f*1.60934 * 1000) > range_y_top = temp_hwz.y + (postcode_distance.to_f*1.60934 * 1000) > range_y_bottom = temp_hwz.y - (postcode_distance.to_f*1.60934 * 1000) > > query += " AND x:[#{range_x_left.to_i} #{range_x_right.to_i}] AND > y:[#{range_y_bottom.to_i} #{range_y_top.to_i}]" > endHi Clare, Ranges are calculated according to lexical ordering, not numerical ordering. Try this: puts ["0", "9", "167"].sort You''ll see that "167" does indeed fall between "0" and "9". Now try this: puts ["000", "009", "167"].sort So that should explain what you have to do. You need to pad all numbers to a fixed width. Alternatively you could build a custom IntegerRangeFilter and combine it with a ConstantScoreQuery. Here is an example for Floats: require ''rubygems'' require ''ferret'' class FloatRangeFilter attr_accessor :field, :upper, :lower, :upper_op, :lower_op def initialize(field, options) @field = field @upper = options[:<] || options[:<=] @lower = options[:>] || options[:>=] if @upper.nil? and @lower.nil? raise ArgError, "Must specify a bound" end @upper_op = options[:<].nil? ? :<= : :< @lower_op = options[:>].nil? ? :>= : :> end def bits(index_reader) bit_vector = Ferret::Utils::BitVector.new term_doc_enum = index_reader.term_docs index_reader.terms(@field).each do |term, freq| float = term.to_f next if @upper and not float.send(@upper_op, @upper) next if @lower and not float.send(@lower_op, @lower) term_doc_enum.seek(@field, term) term_doc_enum.each {|doc_id, freq| bit_vector.set(doc_id)} end return bit_vector end def hash return @field.hash ^ @upper.hash ^ @lower.hash ^ @upper_op.hash ^ @lower_op.hash end def eql?(o) return (o.instance_of?(FloatRangeFilter) and @field == o.field and @upper == o.upper and @lower == o.lower and @upper_op == o.upper_op and @lower_op == o.lower_op) end end You''ll have to work out what is going on here yourself though. I have no time for explanation. Note that this won''t perform very well compared to the padded field version because so much is going on in the Ruby code. I could possibly be persuaded to implement this in C. Cheers, Dave
Sam Giffney
2006-Sep-21 02:28 UTC
[Ferret-talk] Range searches some times they work, some times not...
David Balmain wrote:> On 9/20/06, Clare <clare.cav at arogent.co.uk> wrote:<SNIP>> > You''ll have to work out what is going on here yourself though. I have > no time for explanation. Note that this won''t perform very well > compared to the padded field version because so much is going on in > the Ruby code. I could possibly be persuaded to implement this in C. > > Cheers, > Dave</SNIP> I''ve also implemented a geographic search using lucene/ferret. There a couple of key points that helped me ''get it'' - 1 - lucene does lexographic, not numeric, search so to search on numbers you need to convert them to a string which works for lexographic sort (usually by adding leading zeros or a fixed number of decimal places after the decimal point) [as pointed out by Dave above] 2 - a range search is actually converted into a boolean search internally (someone please correct me if I got that wrong) so doing a range search over massive ranges may be problematic by exceeding accepted query lengths. Then you start a trade off between accuracy (more decimal places) and speed. The way I got round it was to assume that for my purposes search only needed to be accurate to about 100m so formatting longitude/latitude to 3 decimal places would work fine (I live in a small country :) Sam -- Posted via http://www.ruby-forum.com/.
David Balmain
2006-Sep-21 06:07 UTC
[Ferret-talk] Range searches some times they work, some times not...
On 9/21/06, Sam Giffney <samuelgiffney at gmail.com> wrote: <snip>> 2 - a range search is actually converted into a boolean search > internally (someone please correct me if I got that wrong) so doing a > range search over massive ranges may be problematic by exceeding > accepted query lengths. Then you start a trade off between accuracy > (more decimal places) and speed. The way I got round it was to assume > that for my purposes search only needed to be accurate to about 100m so > formatting longitude/latitude to 3 decimal places would work fine (I > live in a small country :)This used to be correct, but it is no longer the case in either Ferret or Lucene (version 2.0). RangeQueries get reduced to ConstantScoreQueries which use a Filter. So Sam, you can now feel free to use RangeQueries with as large a Range as you like :-). WildcardQueries, FuzzyQueries and PrefixQueries do however get rewritten as BooleanQueries in Lucene and MultiTermQueries in Ferret so you do need to be careful when using these queries. Ferret''s MultiTermQuery is a lot more efficient than a BooleanQuery for this task so it it allows a lot more clauses then you could probably use efficiently in Lucene. Also, the query "*" gets rewritten as a MatchAllQuery so it is safe to use. Cheers, Dave