Patrick Oliver Glauner
2012-Jun-25 08:30 UTC
[Xapian-discuss] Xapian performance and xapian-python
Hi.
I added 400K full-texts of bibliographic records (theses, papers etc.) to a
Xapian database with a total size of about 30 GB. My source code is written in
Python and I use xapian-python and Xapian 1.2.5.
The test system is a Dell PowerEdge M600 0MY736 server. It has two Intel Xeon
E5410 CPUs @ 2.33GHz and eight cores in total. Furthermore, it contains 16 GB
RAM and two SCSI hard disks with 146 GB each. It uses Scientific Linux CERN 5
(SLC5) as operating system.
My source code is:
-------------------------------
import xapian
QUERY = '"phys rev"'
RANKED_RESULT_AMOUNT = 10
database = xapian.Database([...])
enquire = xapian.Enquire(database)
query_string = QUERY
qp = xapian.QueryParser()
stemmer = xapian.Stem("english")
qp.set_stemmer(stemmer)
qp.set_database(database)
qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME)
pattern = qp.parse_query(query_string, xapian.QueryParser.FLAG_PHRASE)
enquire.set_query(pattern)
%time matches = enquire.get_mset(0, RANKED_RESULT_AMOUNT)
-------------------------------
The output is:
CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s
Wall time: 1.99 s
Querying an equivalent Solr instances is much faster:
CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s
Wall time: 0.21 s
Question 1
How do you evaluate the Xapian wall time?
Question 2
Is there anything wrong with my source code to explain this?
Question 3
How come that the Xapian time consumption is almost independent from
RANKED_RESULT_AMOUNT? If I increase it to 10000, the wall time is still nearly
the same.
Question 4
How can I improve Xapian performance? Are there any configuration parameters I
can use?
Thanks
Patrick
--
Patrick GLAUNER [patrick.oliver.glauner at cern.ch]
CERN
Information Technology Department
CH-1211 Geneva 23
On Mon, Jun 25, 2012 at 08:30:10AM +0000, Patrick Oliver Glauner wrote:> The output is: > CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s > Wall time: 1.99 s > > Querying an equivalent Solr instances is much faster: > CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s > Wall time: 0.21 sAre these warm cache or cold cache times?> Question 4 > How can I improve Xapian performance? Are there any configuration > parameters I can use?Does the patch here help: http://trac.xapian.org/ticket/394 Cheers, Olly
Patrick Oliver Glauner
2012-Jun-28 11:50 UTC
[Xapian-discuss] Xapian performance and xapian-python
> Are these warm cache or cold cache times?Warm cache times.>Does the patch here help: http://trac.xapian.org/ticket/394We installed it, but there is no significant change. Next, we are profiling it to get a better understanding of the issue and then we will get back to you. But I would like to refer to my original third question which is really important for us:> Question 3 >How come that the Xapian time consumption is almost independent from RANKED_RESULT_AMOUNT? > If I increase it to 10000, the wall time is still nearly the same.%time matches = enquire.get_mset(0, RANKED_RESULT_AMOUNT) Is this the way Xapian is supposed to behave? In most of our use cases RANKED_RESULT_AMOUNT is quite small (<100) and we need to get these ranked results as fast as possible. Thanks Patrick ________________________________________ From: Olly Betts [olly at survex.com] Sent: Monday, June 25, 2012 1:12 PM To: Patrick Oliver Glauner Cc: xapian-discuss at lists.xapian.org Subject: Re: [Xapian-discuss] Xapian performance and xapian-python On Mon, Jun 25, 2012 at 08:30:10AM +0000, Patrick Oliver Glauner wrote:> The output is: > CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s > Wall time: 1.99 s > > Querying an equivalent Solr instances is much faster: > CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s > Wall time: 0.21 sAre these warm cache or cold cache times?> Question 4 > How can I improve Xapian performance? Are there any configuration > parameters I can use?Does the patch here help: http://trac.xapian.org/ticket/394 Cheers, Olly