Hi
I am using Xapian 1.012 here, trying to optimise the search preformance.
My testing suit has 10M docs of forum threads,
DB only indexed the thread title, author name, category name, and 1 optional
serialize value(0) which is the unix dateline
DB_full indexed all the DB terms + thread contents
After couple tests, I decided to remove ALL anchor terms such as
SHOW_PUBLIC, MORE_IMPORTANT
Before i used AND_MAYBE (MORE_IMPORTANT) in query to add weights to more
important docs
Before i used AND (SHOW_PUBLIC) to search for public thread
I removed these switches coz sometimes the CPU useage pops to 20% for one
query (espically when the result set is big)
And i also decided to seperate the DB into 2 sets, 1 with contents and 1
without contents
Now, I had removed all switches in Doc....
I also manually sort all documents in a Mysql Inno from lower -> higher
important, older->newer date.
After I had sort the deck in the proper order, i begin to put it in xapian
one by one, docid=1,docid=2,docid=3....
I put it this way is because I dont want to use any sorting by value in
xapian, just the plain sort by docid DESC during my Bool weight query
Ok, my question is, after this setup, most(90%) of my queries are 0.3-0.7
CPU per request now(using PHP binding)..
But once a while, for some term, I am still having a 6% CPU in a very simple
query (using PHP binding)...
e.g.Xapian::Query(movie:(pos=1,wqf=12))
in a 10M docs db only indexed little terms (8.6G size)
Matches Estimated 421,057 Time: 0.1850
This one uses 6.3% CPU
I wonder, what is the cause of this usage of the CPU? is it the ranker?
I already did all I can to minimize costs, what else can I do to prevent /
load balance the situtation?
Will i better off in using other binding? e.g. python?
Will i better off in using distributed search?
My goal is to optimize the search, while the doc size will grow to very
big,e.g. 100M+
My testing suit is using:
Quad CPU Q6600 @ 2.40GHz
8G ram
1x 10krpm WD HD
My live servers:
Dell R710
2x E5530 2.4G
24G RAM 1333MHz
8x 73G 15K RPM SAS raid 0
Cheers
Andrey