Kevin Duraj
2007-Jun-17 05:46 UTC
[Xapian-discuss] Many problems with Xapian 1.x let's roll back to Xapian to 0.9.x
There are so many problems with Xapian 1.0.x that I must propose to roll back to the last good working version of Xapian 0.9.x . What used to take 50 minutes to index now will not index in several hours. We used to have one of the biggest advantage over all other search engines and that was, Xapian could index millions of documents in fraction of what our competitor Lucene could do. With new Xapian 1.0.x that competitive advantage is gone. -- Cheers, Kevin
Olly Betts
2007-Jun-18 04:39 UTC
[Xapian-discuss] Many problems with Xapian 1.x let's roll back to Xapian to 0.9.x
On Sat, Jun 16, 2007 at 09:46:06PM -0700, Kevin Duraj wrote:> There are so many problems with Xapian 1.0.x that I must propose to > roll back to the last good working version of Xapian 0.9.xPlease be realistic. You're not going to get us to throw away many, many fixes and improvements because you're having some teething trouble with the new release. You're welcome to continue to use 0.9.x if it makes you happier, but the more sensible approach would be to help us work out what is causing the slowdowns you're seeing so we can address them.> What used to take 50 minutes to index now will not index in several > hours.So you keep saying, but you've yet to offer us any insight as to why! Over the weekend, I've been trying out reindexing gmane using 1.0.x (xapian-core-1.0.1_svn8931 to be precise, but that's essentially just 1.0.1 plus a new lazy table creation feature which avoids creating the value and/or position tables if they aren't used). It's indexed 6.5 million so far, and the indexing rate is a little less than half what it was on the last rebuild (which used 0.9.9 flint). However, before I was indexing only unstemmed forms, whereas now I'm indexing both stemmed and unstemmed - this means that the number of term postings will have almost doubled and so I'd expect the rate to almost halve just because of that. I did try out running the start of the reindex with the old indexing strategy but the new Xapian before I started the full reindex - this showed it was a little slower than before, but a few percent slower not several times slower. In short, I can't reproduce what you describe with the little information you've provided. Also, nobody else has reported such issues, and I've heard reports that 1.0.1 is faster at indexing for some people (Jean-Francois Dockes reports that it makes Recoll index nearly twice as fast compare to 0.9 quartz). So if you want this to be addressed, you're going to have to analyse what's going on in your case. I suggested before that you should try increasing COMPRESS_MIN in backends/flint/flint_table.cc to see what effect different values have. Have you tried that? It's currently 4, but what happens if it's 100? If that makes no difference, go higher; if that makes a huge difference, try a value in between. Cheers, Olly