Kevin Duraj
2007-Jun-17 07:51 UTC
[Xapian-discuss] Flint failed to deliver indexing performance to Quartz.
Flint failed to deliver indexing performance to Quartz. I am proposing to remove Flint as default database and place Quartz database back as default. The catch is not that Flint database is smaller and faster during searches then Quartz database as developers were concerning when were measuring and neglecting to measure performance when creating the large indexes. The truth is that Flint database can not scale beyon 5 million documents to index in reasonable time. High disk activities has been reported when indexing using Flint, server is seizing not able to write to Hard Disk compare when Quartz database is used to index. Flint show to be 10-16 times slower during indexing 10 million of documents on 4 CPU 16GB memory servers. Flint so far absolutely failed to deliver nearly fractionally the performance that Quartz database has been achieving during high quantity documents indexing in short time using plenty of memory. Example of my benchmarks: Quartz database index 10 million of unique documents with set XAPIAN_FLUSH_THRESHOLD=10000000 in less then 1 hour. Flint database index 10 million of unique documents with set XAPIAN_FLUSH_THRESHOLD=10000000 in less then 16 hours. Please provide settings to remove Flint and add Quartz as default database. Unless the unacceptable indexing performance using Flint database will be resolved. Do not even think about to removing support for Quartz database from Xapian. Thank you, -- Cheers, Kevin Duraj
Hi there! My company has more than 15 billion of web documents. We're archiving all the web (like Internet Archive). Last spring, we used Lucene to have a full text search capability but we faced a limit at 500 million documents. We're now considering XAPIAN. Could anybody share experiences in this kind of huge dataset? cheers Y.
Olly Betts
2007-Jun-18 05:09 UTC
[Xapian-discuss] Flint failed to deliver indexing performance to Quartz.
On Sat, Jun 16, 2007 at 11:50:58PM -0700, Kevin Duraj wrote:> I am proposing to remove Flint as default database and place Quartz > database back as default.Again, please be realistic.> The catch is not that Flint database is > smaller and faster during searches then Quartz database as developers > were concerning when were measuring and neglecting to measure > performance when creating the large indexes.Please stop spreading FUD. This simply isn't true. I looked at indexing performance, search performance, and index size during development.> The truth is that Flint database can not scale beyon 5 million > documents to index in reasonable time. High disk activities has been > reported when indexing using Flint, server is seizing not able to > write to Hard Disk compare when Quartz database is used to index.You are the only person to report this, but rather than help us to address this by investigating why, you just keep telling us about it and then suggesting we "fix" it by throwing away months of useful work.> Flint show to be 10-16 times slower during indexing 10 million of > documents on 4 CPU 16GB memory servers.I don't have such a server to test on, and I don't have your data sets to test with, so you're going to need to do some detective work as to why this might be, or show me how to demonstrate similar problems with data sets I have access to on machines I have access to.> Flint so far absolutely failed to deliver nearly fractionally the > performance that Quartz database has been achieving during high > quantity documents indexing in short time using plenty of memory.... in your application. It seems to work very well for others.> Example of my benchmarks: > > Quartz database index 10 million of unique documents with set > XAPIAN_FLUSH_THRESHOLD=10000000 in less then 1 hour. > > Flint database index 10 million of unique documents with set > XAPIAN_FLUSH_THRESHOLD=10000000 in less then 16 hours.A useful benchmark needs to include sufficient information that it can be reproduced. This isn't a useful benchmark, since there's no way I can reproduce it for myself.> Please provide settings to remove Flint and add Quartz as default > database.If you really must, that already exists: ./configure --disable-backend-flint But it's tantamout to burying your head in the sand.> Unless the unacceptable indexing performance using Flint > database will be resolved.Since only you can see it, it will only be resolved if you help us to resolve it!> Do not even think about to removing support for Quartz database from > Xapian.Quartz is scheduled for removal in Xapian 1.1.0. We don't have the resources to maintain multiple generations of backends in parallel, but if you really want it to stay, you could offer to maintain the code. However, it would almost certainly be easier to help us work out why flint isn't working as well for you. Cheers, Olly
Maybe Matching Threads
- Empty results OMEGA with XAPIAN 1.0.1
- BUG IN XAPIAN_FLUSH_THRESHOLD
- My new record: Indexing 20 millions docs = 79m9.378s
- Re: [Xapian-commits] 7603: trunk/xapian-core/ trunk/xapian-core/backends/flint/ trunk/xapian-core/backends/quartz/
- Re: [Xapian-commits] 8157: trunk/xapian-core/ trunk/xapian-core/backends/flint/ trunk/xapian-core/backends/quartz/