Just a note to say that I'm working on a performance test framework for Xapian, mainly targeted at analysing query speeds. I'm hoping to make this into a reasonably easy system to set-up, so that we can get performance test results on lots of architectures. For sample data, I'm using an XML dump of wikipedia - I've got a simple python script which converts this into a scriptindex input file (though doesn't yet do anything about understanding the wiki mark-up). This results in a 25Gb database (containing 2657375 documents, average length 492 terms), which should give a good basis for benchmarking searches in an IO bound situation. I should probably also use some smaller corpuses to cover the CPU/memory-IO bound situation. I've made a bug in the bug tracker to track progress on this (#107 - http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=107) - I'm hoping to eventually get this running on at least one machine on a regular basis, so that we can track how revisions to the code affect performance. In particular, I want this in place before I work on bug #100. At present, I'm just posting this here to let people know that I have some code which parses wikipedia XML dumps, so they don't waste time writing their own one - ask me instead. I'll publish the code publicly when it's all tidied up. -- Richard