Postlists are split up into chunks, so that skip_to can avoid reading
all the postlist.
Currently the chunk threshold is 2048, but this is checked before adding
an entry, so the postlist chunk can actually grow a little larger.
Something like 2060 at most. Unfortunately this isn't a good threshold
with the default blocksize (8192 bytes).
Internally the B-tree splits up items with a large tag. It will always
make sure it can fit at least 4 items in a block. There's some overhead
for the block header (11 + 2 per item) and there's 2 + 1 + 2 + 2 = 7
bytes of overhead in each item. The upshot is that the maximum size of
key + tag is (8192 - 11 - 4 * 2) / 4 - 7 = 2036 bytes.
So I guessed at a bound for typical key length and allowed some for
overrunning the threshold and decided 2000 bytes would be a better
threshold. I tried building a sample database with both versions:
2048 bytes:
real 115m44.698s
user 98m14.880s
sys 3m13.560s
-rw-r--r-- 1 olly olly 322043904 Aug 23 02:50 postlist_DB
2000 bytes:
real 112m5.970s
user 96m35.130s
sys 3m47.350s
-rw-r--r-- 1 olly olly 315490304 Aug 23 11:08 postlist_DB
The machine wasn't totally idle, but it has 2 CPUs so hopefully the
timings are reasonable comparable. But the DB is 2% smaller, which
isn't bad for a 2 character source change!
I've not got a harness for testing query speed, but I would expect
that will be improved by this change as well.
So far, so good.
But how about calculating the max tag size taking into account the true
key length and imposing the threshold exactly?
Doing this actually appears to be noticeably worse than 2000, though a
bit better than 2048. Not quite sure why - maybe random factors from
packing shorter tags make as much difference, or maybe I was off by one
(or a few) in my size calculations (I've double checked them, but
perhaps I misunderstood the structure).
Exact
real 114m3.236s
user 96m33.610s
sys 3m43.580s
-rw-r--r-- 1 olly olly 321634304 Aug 23 14:27 postlist_DB
The user and system times are both down, so perhaps the increased wall
clock time is because the box was just busier. But the DB size is
definitely up from 2000.
Do we have anything which can inspect the B-tree structure at this low
level? Quartzdump dumps out key/tag pairs, but split up tags have
already been reassembled.
Cheers,
Olly