thr3ads.net - Xapian discuss - [Xapian-discuss] Initial benchmark results quartz and flint [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Arjen van der Meijden

2005-Jun-29 12:19 UTC

[Xapian-discuss] Initial benchmark results quartz and flint

Hi List,

I've done some benchmarking and have the first set of results here. The 
databases (their size and parameters) can be found earlier this month on 
the list if you're interested.

It appears from these results that flint is significantly faster to 
search in, both with phrase-queries and normal queries. Another, 
somewhat surprising result is that the non-compacted quartz-databases 
are *much* faster with phrase-queries.

flint normal non-phrase: 155,574 s
flint normal phrase: 2 841,680 s
flint compact non-phrase: 96,569 s
flint compact phrase: 3 026,939 s
flint compact -F non-phrase: 94,227 s
flint compact -F phrase: 2 623,404 s

quartz normal non-phrase: 169,853 s
quartz normal phrase: 7 037,056 s
quartz compact -F gz non-phrase: 108,783 s
quartz compact -F gz phrase: 9 249,504 s
quartz compact -n-F gz non-phrase: 109,650 s
quartz compact-n-F gz phrase: 8 090,707 s
quartz compact non-phrase: 103,863 s
quartz compact phrase: 9 410,721 s
quartz compact 0.8.4 gz non-phrase: 108,299 s
quartz compact 0.8.4 gz phrase: 8 100,171 s

The benchmark was done by creating a seperate directory on a pretty fast 
hard drive (WD Raptor 36GB 10k rpm sata) that is solely handling the 
current database. The machine has only 1GB of memory, so was pretty much 
I/O-bound with the phrase queries.
The script would first remove the previous database and then copy the 
current database to that same disk. This is not included in the timings.

Then I took the current time in seconds, took all queries from a file 
that would parse to not have a PHRASE-part and execute those and after 
that the queries that did do PHRASE-searches.
This yielded in 65 phrase-queries and 1035 other queries. If it were 
"morelike", boolean-only queries etc, they would be executed as empty 
queries since I was too lazy to implement that correctly.

I cannot explain from the hardware or benchmark setup why the compacted 
quartz databases are so much slower with phrase. First I thought it may 
have been the way they were laid out on disk during their creation; copy 
database may have a tendency to stick the specific database records for 
a document closer to each other, while quartzcompact copies the database 
table by table. But since I copied them using the standard unix copy 
command, that should not be the case with the benchmarks I did now.

I haven't verified whether all results were the same over the databases, 
I'll have to do that to see whether the flint-results were actually 
correct, but I don't have reasons to believe otherwise yet.

To be sure it are not one-time-only numbers, I'm running the benchmarks 
twice more but since that'll take almost a day per run I sent these 
numbers to the list already.

Best regards,

Arjen

Olly Betts

2005-Jun-29 15:33 UTC

head link

[Xapian-discuss] Initial benchmark results quartz and flint

On Wed, Jun 29, 2005 at 01:19:54PM +0200, Arjen van der Meijden
wrote:> It appears from these results that flint is significantly faster to 
> search in, both with phrase-queries and normal queries.
Well, that's good!
> I cannot explain from the hardware or benchmark setup why the compacted 
> quartz databases are so much slower with phrase.
This is indeed most puzzling (although ultimately flint is faster than
either which is what really matters).

Your test does seem to be carefully designed to eliminate various possible
hand-waving explanations I can think of.
> To be sure it are not one-time-only numbers, I'm running the benchmarks
> twice more but since that'll take almost a day per run I sent these 
> numbers to the list already.
This would be useful - my only current thought is that something else was
happening on the machine at the time (the "update the locate database"
cronjob which usually runs once a day is a prime candidate for this sort
of thing - it does a pretty effective job of flushing the kernel's cache
of disk blocks and swapping out any process which hasn't is currently
inactive).

Cheers,
    Olly

Xapian discuss - Jun 2005 - Initial benchmark results quartz and flint

[Xapian-discuss] Initial benchmark results quartz and flint

[Xapian-discuss] Initial benchmark results quartz and flint