Patrick Oliver Glauner
2012-Jun-25 08:30 UTC
[Xapian-discuss] Xapian performance and xapian-python
Hi. I added 400K full-texts of bibliographic records (theses, papers etc.) to a Xapian database with a total size of about 30 GB. My source code is written in Python and I use xapian-python and Xapian 1.2.5. The test system is a Dell PowerEdge M600 0MY736 server. It has two Intel Xeon E5410 CPUs @ 2.33GHz and eight cores in total. Furthermore, it contains 16 GB RAM and two SCSI hard disks with 146 GB each. It uses Scientific Linux CERN 5 (SLC5) as operating system. My source code is: ------------------------------- import xapian QUERY = '"phys rev"' RANKED_RESULT_AMOUNT = 10 database = xapian.Database([...]) enquire = xapian.Enquire(database) query_string = QUERY qp = xapian.QueryParser() stemmer = xapian.Stem("english") qp.set_stemmer(stemmer) qp.set_database(database) qp.set_stemming_strategy(xapian.QueryParser.STEM_SOME) pattern = qp.parse_query(query_string, xapian.QueryParser.FLAG_PHRASE) enquire.set_query(pattern) %time matches = enquire.get_mset(0, RANKED_RESULT_AMOUNT) ------------------------------- The output is: CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s Wall time: 1.99 s Querying an equivalent Solr instances is much faster: CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s Wall time: 0.21 s Question 1 How do you evaluate the Xapian wall time? Question 2 Is there anything wrong with my source code to explain this? Question 3 How come that the Xapian time consumption is almost independent from RANKED_RESULT_AMOUNT? If I increase it to 10000, the wall time is still nearly the same. Question 4 How can I improve Xapian performance? Are there any configuration parameters I can use? Thanks Patrick -- Patrick GLAUNER [patrick.oliver.glauner at cern.ch] CERN Information Technology Department CH-1211 Geneva 23
On Mon, Jun 25, 2012 at 08:30:10AM +0000, Patrick Oliver Glauner wrote:> The output is: > CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s > Wall time: 1.99 s > > Querying an equivalent Solr instances is much faster: > CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s > Wall time: 0.21 sAre these warm cache or cold cache times?> Question 4 > How can I improve Xapian performance? Are there any configuration > parameters I can use?Does the patch here help: http://trac.xapian.org/ticket/394 Cheers, Olly
Patrick Oliver Glauner
2012-Jun-28 11:50 UTC
[Xapian-discuss] Xapian performance and xapian-python
> Are these warm cache or cold cache times?Warm cache times.>Does the patch here help: http://trac.xapian.org/ticket/394We installed it, but there is no significant change. Next, we are profiling it to get a better understanding of the issue and then we will get back to you. But I would like to refer to my original third question which is really important for us:> Question 3 >How come that the Xapian time consumption is almost independent from RANKED_RESULT_AMOUNT? > If I increase it to 10000, the wall time is still nearly the same.%time matches = enquire.get_mset(0, RANKED_RESULT_AMOUNT) Is this the way Xapian is supposed to behave? In most of our use cases RANKED_RESULT_AMOUNT is quite small (<100) and we need to get these ranked results as fast as possible. Thanks Patrick ________________________________________ From: Olly Betts [olly at survex.com] Sent: Monday, June 25, 2012 1:12 PM To: Patrick Oliver Glauner Cc: xapian-discuss at lists.xapian.org Subject: Re: [Xapian-discuss] Xapian performance and xapian-python On Mon, Jun 25, 2012 at 08:30:10AM +0000, Patrick Oliver Glauner wrote:> The output is: > CPU times: user 1.82 s, sys: 2.16 s, total: 3.99 s > Wall time: 1.99 s > > Querying an equivalent Solr instances is much faster: > CPU times: user 0.34 s, sys: 0.00 s, total: 0.34 s > Wall time: 0.21 sAre these warm cache or cold cache times?> Question 4 > How can I improve Xapian performance? Are there any configuration > parameters I can use?Does the patch here help: http://trac.xapian.org/ticket/394 Cheers, Olly