thr3ads.net - Ferret talk - [Ferret-talk] Using ferret as a base64-encoded numerical db [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Kelly Jones

2009-Jul-06 14:15 UTC

[Ferret-talk] Using ferret as a base64-encoded numerical db

I''m using ferret to store random base64 strings of length 72 (courtesy
"dd if=/dev/random ... | mmencode"), with the long-term goal of
storing floating point/integral numbers (converted to
base64). Problems:

 % Ferret regards the base64 characters "+" and "/" as word
 separators, so a search for "content:[xji xjj]" yields things like
 "FqWu9uXM99HXZEJMl0Ux/jdOSP0+XJiL9v1ZDK24D0LMp60PUMPdhkbnFQykVMfilxecQFU6"
 where "xji" appears after a plus sign. How to avoid this? I could
 change "+" to "_", but I''m not sure changing
"/" to "." or ":" or "-"
 or "!" would work.

 % Ferret''s default search is case-insensitive, so I get things like
 "xJiQf0PEagWJME9Tf5pFu6dk4UGGFw5Lc0PIfa9N70Mb2IG2IWO36VCsC0y7Q1zOrLjk2Lz4",
 which match "xJi" but not "xji". How to fix?

 % When I do a range query, does ferret return *all* documents
 matching the query or only the highest scoring 10? For my purposes, I
 need *all* documents matching a query, not just the first few.

Is anyone else using ferret as a db? Since it''s hash-based,
it''s much
faster at indexing large numbers of strings than sqlite3.

I realize I could just 0-pad my numbers (eg, "000005" for 5), but
I''ve
got a LOT of data (400M pairs of floating point numbers), so I prefer
compactness.

-- 
We''re just a Bunch Of Regular Guys, a collective group that''s
trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.

Ferret talk - Jul 2009 - Using ferret as a base64-encoded numerical db

[Ferret-talk] Using ferret as a base64-encoded numerical db