Arjen van der Meijden
2005-Jun-29 12:19 UTC
[Xapian-discuss] Initial benchmark results quartz and flint
Hi List, I've done some benchmarking and have the first set of results here. The databases (their size and parameters) can be found earlier this month on the list if you're interested. It appears from these results that flint is significantly faster to search in, both with phrase-queries and normal queries. Another, somewhat surprising result is that the non-compacted quartz-databases are *much* faster with phrase-queries. flint normal non-phrase: 155,574 s flint normal phrase: 2 841,680 s flint compact non-phrase: 96,569 s flint compact phrase: 3 026,939 s flint compact -F non-phrase: 94,227 s flint compact -F phrase: 2 623,404 s quartz normal non-phrase: 169,853 s quartz normal phrase: 7 037,056 s quartz compact -F gz non-phrase: 108,783 s quartz compact -F gz phrase: 9 249,504 s quartz compact -n-F gz non-phrase: 109,650 s quartz compact-n-F gz phrase: 8 090,707 s quartz compact non-phrase: 103,863 s quartz compact phrase: 9 410,721 s quartz compact 0.8.4 gz non-phrase: 108,299 s quartz compact 0.8.4 gz phrase: 8 100,171 s The benchmark was done by creating a seperate directory on a pretty fast hard drive (WD Raptor 36GB 10k rpm sata) that is solely handling the current database. The machine has only 1GB of memory, so was pretty much I/O-bound with the phrase queries. The script would first remove the previous database and then copy the current database to that same disk. This is not included in the timings. Then I took the current time in seconds, took all queries from a file that would parse to not have a PHRASE-part and execute those and after that the queries that did do PHRASE-searches. This yielded in 65 phrase-queries and 1035 other queries. If it were "morelike", boolean-only queries etc, they would be executed as empty queries since I was too lazy to implement that correctly. I cannot explain from the hardware or benchmark setup why the compacted quartz databases are so much slower with phrase. First I thought it may have been the way they were laid out on disk during their creation; copy database may have a tendency to stick the specific database records for a document closer to each other, while quartzcompact copies the database table by table. But since I copied them using the standard unix copy command, that should not be the case with the benchmarks I did now. I haven't verified whether all results were the same over the databases, I'll have to do that to see whether the flint-results were actually correct, but I don't have reasons to believe otherwise yet. To be sure it are not one-time-only numbers, I'm running the benchmarks twice more but since that'll take almost a day per run I sent these numbers to the list already. Best regards, Arjen
Olly Betts
2005-Jun-29 15:33 UTC
[Xapian-discuss] Initial benchmark results quartz and flint
On Wed, Jun 29, 2005 at 01:19:54PM +0200, Arjen van der Meijden wrote:> It appears from these results that flint is significantly faster to > search in, both with phrase-queries and normal queries.Well, that's good!> I cannot explain from the hardware or benchmark setup why the compacted > quartz databases are so much slower with phrase.This is indeed most puzzling (although ultimately flint is faster than either which is what really matters). Your test does seem to be carefully designed to eliminate various possible hand-waving explanations I can think of.> To be sure it are not one-time-only numbers, I'm running the benchmarks > twice more but since that'll take almost a day per run I sent these > numbers to the list already.This would be useful - my only current thought is that something else was happening on the machine at the time (the "update the locate database" cronjob which usually runs once a day is a prime candidate for this sort of thing - it does a pretty effective job of flushing the kernel's cache of disk blocks and swapping out any process which hasn't is currently inactive). Cheers, Olly