Hi Georges,
On 24-5-2005 18:04, Georges Dupret wrote:> Hi!
>
> While promoting xapian, I have been asked to answer the following
> question: How many queries per second is Xapian able to answer?
> The database size would be around 15 GB of uncompressed text, on a
> regular desktop machine with one CPU and 1G of RAM.
I don't think there is a single answer to that question. We have a
similar sized database (about 18GB) on a much heavier machine (dual Xeon
2.8, 4GB of memory, two scsi discs in raid 0 dedicated for the xapian
database).
It can handle quite a bit of queries, but its not its only task,
although it is its most expensive task in terms of performance.
The load of the machine varies a lot, but averages out to about 3. In
the past six hours it was loaded with a peak hour of 2770 real-user
queries. But I doubt that is the maximum of queries it can achieve.
Especially if there weren't positional searches (string matches, near
searches, etc).
These are the number of queries the machine has had to process since May
22nd about midnight.
86098 normalsearches.log
3406 slowsearches.log
31 error.log
89535 total
A query is logged in the slowsearches.log if it lasted more than 2
seconds pure search time (a query can take more when the
resultprocessing is accounted as well).
I have never really benchmarked the capacity of Omega, it just basically
is "fast enough" apart from some corner-cases with the positional
searches.
We have had the database (a while back, so it was smaller) on a lower
end server (dual xeon 2.4Ghz, 2 GB ram, one ide disk) and that was
significantly higher loaded at the time, but it managed to process the
load of our site. It could get very sufficated in I/O though when there
were some positional searches at the same time.
Since normal searches take an average of some 0.2 - 0.4 seconds (to be
on the safe side) when the machine is not loaded, I estimate it can
easily reach 5-10 queries/second on our database.
Of course it all depends on your documents, the amount of distinct
terms, the queries you have, etc.
If all documents are very well constructed (no weird terms, so the
amount of distinct terms is much lower) and well filtered for "stop
words", you might be able to achieve twice or more our performance.
I hope I was of any help. Please note that I don't think the
"#queries/second" is a very usefull statement in this context. The
amount of data it can search "fast enough to satisfy the users under
your normal load" is much more interesting (and harder to define).
Best regards,
Arjen