Hi, I''ve discovered ferret and aaf this evening, I''ve just done some tests and it seems perfect for my needs. I''m indexing text data (title, description, etc) and also ethernet hardware addresses (MAC). Sorry if that sounds trivial but I can''t find the way to correctly index and achieve correct searches on MAC addresses. If I do something like this: index = Index::Index.new() index << {:hwaddr => ''00:11:22:33:44:55''} index.search_each(''"11:11"'') do |id, score| puts "Document #{id} found with a score of #{score}" end it matches. if i search ''11\:11'' it also matches. if the search is ''00*11*'' or ''*11*22*'' it does not matches if hwaddr = ''00z11z22z33z44z55'' it works as expected. If tried with untokenized index but that didn''t help. Should I escape : before indexing ? (that''s not convenient) Should I use another Analyzer ? Any help would be appreciated. Thanks in advance. --
Hey .. what you should do is to write your own analyzer.. that splits the HWAddress at the : and therefore stores each part of the MAC address as a separate token.. this can be done using the RegExpAnalyzer .. maybe like that: RegExpAnalyzer.new(/[^:]+/, true) [1] I would then search via SpanNearQueries [2] to search for certain MAC parts in a specific order.. like that query = SpanNearQuery.new(:slop => 5, :in_order => true) query << SpanTermQuery.new(:hwaddr, "11") query << SpanTermQuery.new(:fhwaddr, "22") this should find all items with 11<something>22 Hope that helps .. Ben [1] http://ferret.davebalmain.com/api/classes/Ferret/Analysis/ RegExpAnalyzer.html [2] http://ferret.davebalmain.com/api/classes/Ferret/Search/Spans/ SpanNearQuery.html
Benjamin Krause <bk at benjaminkrause.com> writes:> Hey .. > > what you should do is to write your own analyzer.. that splits > the HWAddress at the : and therefore stores each part of > the MAC address as a separate token.. this can be done using > the RegExpAnalyzer .. maybe like that: > > RegExpAnalyzer.new(/[^:]+/, true) [1] > > I would then search via SpanNearQueries [2] to search for certain > MAC parts in a specific order.. like that > > query = SpanNearQuery.new(:slop => 5, :in_order => true) > query << SpanTermQuery.new(:hwaddr, "11") > query << SpanTermQuery.new(:fhwaddr, "22") > > this should find all items with 11<something>22 > > Hope that helps ..Hey it does. Thanks. I first thought it was a bug and I would have liked an easier solution. (for ex: stop the Analyzer to condiser '':'' as a stop word ) I don''t think I need to use the RegExpAnalyzer for hwaddr since the Standard one also cuts on '':''. I''m going to use :slop=>1, :in_order => true And I''ll try to detect hwaddr search queries to feed SpanNearQuery accordingly by looking for '':'' in the query and see if the word before '':'' matches a fieldname. (if it doesn''t and looks like a hwaddr I''ll feed SpanNearQuery) Pretty sure that could be done in a nicer way. (don''t hesitate to make suggestions :)) Also if there''s other ways to index mac addresses without splitting on : I would be interested to read about them. (especially if I can use the query without too much processing) Anyway, Thanks again for the quick answer. --
syrius.ml at no-log.org writes:> Also if there''s other ways to index mac addresses without splitting on > : I would be interested to read about them. (especially if I can use > the query without too much processing)Oh in fact what i want to use is the WhiteSpaceAnalyzer for the field ''hwaddr'' ... (i seems i missed this one before) :) --