Karl Fife
2008-Aug-15 05:56 UTC
[asterisk-users] AstDB/Berkely DB - Hash function? Balanced-Tree? b-Tree? Linked List?
Does anyone know enough about the implementation of AstDB to know whether the data structure is a Hash function, a Balanced-Tree, a b-Tree, or a Linked List? I'm trying to estimate the lookup 'cost' of a AstDB with around 160,000 keys? Obviously I already know that it WILL WORK, but the question is whether the data structure is optimal in the Berkeley DB AS IMPLEMENTED in Asterisk. AstDB just like CURL is missing some of its features as implemented, so the generic Berkeley Doc doesn't help much. The key-space is ideal. It's just npa/nxx lookups so it's UNIQUE and EVENLY DISTRIBUTED--a perfect for a hash function (or even a balanced tree). What I do NOT want is a 150k member linked list, or even a standard b-tree that ends up being 160k entries tall because the values were inserted in order etc. In terms of Databases, I know that 160K keys is very small potatoes, but I want to make sure I understand what's going on under the hood so that my lookup costs are as low as possible. Lookups on my Oracle database with tens of millions of records are instantaneous and inexpensive when implemented properly, and about ten thousand times slower (actually) when done improperly. If a database has only 160K keys, it can be tough to tell whether the data structures are being used efficiently, because even WILDLY INEFFICIENT lookups seem fast. Can anyone speak to this? What is the default data structure? How many records have you stuck into the AstDB? Thank you! -Karl Fife
Russell Bryant
2008-Aug-15 12:12 UTC
[asterisk-users] AstDB/Berkely DB - Hash function? Balanced-Tree? b-Tree? Linked List?
Karl Fife wrote:> Does anyone know enough about the implementation of AstDB to know > whether the data structure is a Hash function, a Balanced-Tree, a > b-Tree, or a Linked List?I've never looked at the internals of db1. However, by simply looking at what code is included, it looks like it is based on a b-tree. You may have to add some debugging within db1 to see how nodes actually get laid out when you add your 160k entries. The code is in main/db1-ast/. If you're doing something this large, I would encourage you to consider just using a different database, and using func_odbc to access it as opposed to the astdb functions. -- Russell Bryant Senior Software Engineer Open Source Team Lead Digium, Inc.
Atis Lezdins
2008-Aug-15 12:35 UTC
[asterisk-users] AstDB/Berkely DB - Hash function? Balanced-Tree? b-Tree? Linked List?
On Fri, Aug 15, 2008 at 8:56 AM, Karl Fife <asterisk-users at kfife.mailworks.org> wrote:> Does anyone know enough about the implementation of AstDB to know > whether the data structure is a Hash function, a Balanced-Tree, a > b-Tree, or a Linked List? > > I'm trying to estimate the lookup 'cost' of a AstDB with around 160,000 > keys? Obviously I already know that it WILL WORK, but the question is > whether the data structure is optimal in the Berkeley DB AS IMPLEMENTED > in Asterisk. AstDB just like CURL is missing some of its features as > implemented, so the generic Berkeley Doc doesn't help much. > > The key-space is ideal. It's just npa/nxx lookups so it's UNIQUE and > EVENLY DISTRIBUTED--a perfect for a hash function (or even a balanced > tree). What I do NOT want is a 150k member linked list, or even a > standard b-tree that ends up being 160k entries tall because the values > were inserted in order etc. > > In terms of Databases, I know that 160K keys is very small potatoes, but > I want to make sure I understand what's going on under the hood so that > my lookup costs are as low as possible. > > Lookups on my Oracle database with tens of millions of records are > instantaneous and inexpensive when implemented properly, and about ten > thousand times slower (actually) when done improperly. If a database > has only 160K keys, it can be tough to tell whether the data structures > are being used efficiently, because even WILDLY INEFFICIENT lookups seem > fast. > > Can anyone speak to this? > What is the default data structure? > How many records have you stuck into the AstDB? >Hi, I can share my experience only. I'm storing call variables in astdb, daily ~2000 calls, ~20 variables per call. Stored in way like this: call_variables/${uniqueid}/var=value Previously i did cleanup of this only nightly, but sometimes i noticed that lookup in evenings get extremely slow - ~5-10 seconds per variable. So, i would guess that it's not very optimal, as every lookup from 2000 keys takes several seconds (sub-keys shouldn't affect this). Anyway, my solution was to keep it small by deleting everything upon hangup of call. A little clutter, but it works more or less. Regards, Atis -- Atis Lezdins, VoIP Project Manager / Developer, atis at iq-labs.net Skype: atis.lezdins Cell Phone: +371 28806004 Cell Phone: +1 800 7300689 Work phone: +1 800 7502835
Jay R. Ashworth
2008-Aug-15 16:27 UTC
[asterisk-users] AstDB/Berkely DB - Hash function? Balanced-Tree? b-Tree? Linked List?
On Fri, Aug 15, 2008 at 12:56:49AM -0500, Karl Fife wrote:> The key-space is ideal. It's just npa/nxx lookups so it's UNIQUE and > EVENLY DISTRIBUTEDBased on my knowledge of the NPA/NXX space, I wouldn't expect that either a) A given batch of random DNs would have either or both NPA/NXX components evenly distributed over all the valid NPA/NXXs in the NANPA, or b) that the assigned NPA/NXXs in the NANPA are themselves evenly distributed over all the valid NPA/NXXs. Could you clarify the background that brings you to that assumption? Do you have empirical data? Cheers, -- jra -- Jay R. Ashworth Baylink jra at baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com '87 e24 St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 Those who cast the vote decide nothing. Those who count the vote decide everything. -- (Josef Stalin)