Hey all, I'm also curious if crit-bit trees can be suited as an on-disk structure for Xapian: https://cr.yp.to/critbit.html One of the most appealing properties to me is it doesn't require balancing operations of typical trees, nor resize operations of hash tables. The other appealing property is it is very little code (at least the in-memory versions). I recently introduced an in-memory implementation into git.git (now 'next' branch) for doing SHA abbreviation lookups on loose objects. It'll likely be in git.git 'master' and a release, soon. Good properties of crit-bit trees: + O(key_length) lookups/insert/delete (one branch per-differing bit) + ability to filter and iterate on prefixes + no balancing operations, expected to reduce writes + low per-node overhead, compatible with "container_of" idiom + very little code (even I can understand it :) + the prefixes could also be suitable for managing free space; I think dlmalloc 2.8.x does something similar. + everything else mentioned in https://cr.yp.to/critbit.html Downsides: - iteration requires recursion, which is bound by key length; so arbitrarily long keys will hurt. - relatively new and AFAIK unproven as an on-disk format. CPU cache lines are ~64-bytes; FS and SSD blocks can be some orders of magnitude larger so there'll still be more write amplification. - affected by locality and fragmentation (but AFAIK all large data/memory structures do, and we have xapian-compact). I haven't thought much about how transactions would work with on-disk formats, though. With glass (xapian 1.4.11 on Debian stable); I seemed to be hitting a fair amount of write traffic doing mass updates with 3 big shards, ~80GB each. It seems to be the same problem that affected recoll: https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html <23640.32170.703368.841021 at y.dockes.com> (I'd have to setup on a different machine to profile) I encountered my own problems having too many shards: https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html <20200826064728.GA32239 at dcvr> Which is why I have 3 big shards (4-cores) atm... I don't have a lot of RAM so I commit fairly frequently; and I'm also on SATA-2 which is a bottleneck. Anyways, thanks for reading.