Kelly Jones
2009-Jul-06 14:15 UTC
[Ferret-talk] Using ferret as a base64-encoded numerical db
I''m using ferret to store random base64 strings of length 72 (courtesy "dd if=/dev/random ... | mmencode"), with the long-term goal of storing floating point/integral numbers (converted to base64). Problems: % Ferret regards the base64 characters "+" and "/" as word separators, so a search for "content:[xji xjj]" yields things like "FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6" where "xji" appears after a plus sign. How to avoid this? I could change "+" to "_", but I''m not sure changing "/" to "." or ":" or "-" or "!" would work. % Ferret''s default search is case-insensitive, so I get things like "xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4", which match "xJi" but not "xji". How to fix? % When I do a range query, does ferret return *all* documents matching the query or only the highest scoring 10? For my purposes, I need *all* documents matching a query, not just the first few. Is anyone else using ferret as a db? Since it''s hash-based, it''s much faster at indexing large numbers of strings than sqlite3. I realize I could just 0-pad my numbers (eg, "000005" for 5), but I''ve got a LOT of data (400M pairs of floating point numbers), so I prefer compactness. -- We''re just a Bunch Of Regular Guys, a collective group that''s trying to understand and assimilate technology. We feel that resistance to new ideas and technology is unwise and ultimately futile.