I am having a problem with flushing a database. I am adding N records to the DB (which amounts to 1 - 2000). At then end of the run, I issue a flush() call. The problem is that the flush call never seems to do anything. Every 10000 additions to the database and the library performs a flush (which can take up to 3 hours on a 560,000 document database) as if my flush call was never performed. Two questions: 1) This seems entirely too long, is it? 2) Why would my flush be ignored (no tranactions being used, just straight add using the term generator). This is my flush code: try { Xapian::WritableDatabase* database = getDatabase(); database->flush(); } catch (const Xapian::Error & err) { s="ERROR:"+err.get_msg(); log_it(s.c_str()); write(c_id,"ERROR:-3",8); } return; The getDatabase() function returns a pointer to the open database (from a pool of databases). This same code is used to fetch a pointer to the WritableDatabase object for the insert (because of my earlier problem with opening and closing the DB with each add). There are no errors logged. Any insight or ideas would be appreciated. Thanks, -Michael Lewis
Michael A. Lewis wrote:> I am having a problem with flushing a database. I am adding N records > to the DB (which amounts to 1 - 2000). At then end of the run, I > issue a flush() call. The problem is that the flush call never seems > to do anything. Every 10000 additions to the database and the library > performs a flush (which can take up to 3 hours on a 560,000 document > database) as if my flush call was never performed.Not that I have a solution, but I have a similar problem with my Xapian database. (doccount 8millions) flushtime is fairly long (over 10 minutes on a 16 SAS disk array for 1000 documents added) and monitoring vmstat (and top) I can see that it neither saturates 1 cpu or anything near the block input/output that the disk can deliver (uses around 5MB/s in block/in and out), viewing "top" only around 8-12% IO wait. All of above is measured when Xapian is "flushing". Still running xapian 1.0.4 (with perl-bindings) -- Jesper
I am seeing the following from TOP: top - 14:53:37 up 17 days, 4:24, 1 user, load average: 2.89, 3.09, 3.08 Tasks: 77 total, 1 running, 76 sleeping, 0 stopped, 0 zombie Cpu(s): 5.3%us, 1.0%sy, 0.0%ni, 17.5%id, 75.9%wa, 0.2%hi, 0.2%si, 0.0%st Mem: 3761652k total, 3635644k used, 126008k free, 3004k buffers Swap: 9213268k total, 1311104k used, 7902164k free, 551384k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3159 root 18 0 2907m 2.8g 636 D 12 78.5 866:00.14 ftinsert With FTINSERT being the process that is my insert task. Strangly, it is not using a great deal of CPU top, I usually have to wait for a little while before it is the top task in the list (sorted by CPU usage). This is a 4gb dual core system running a fast raid array. Again, this is only with 560000 documents or so. --Michael ________________________________ From: xapian-discuss-bounces@lists.xapian.org on behalf of Jesper Krogh Sent: Sun 1/20/2008 2:46 PM To: xapian-discuss@lists.xapian.org Subject: Re: [Xapian-discuss] flush problem Michael A. Lewis wrote:> I am having a problem with flushing a database. I am adding N records > to the DB (which amounts to 1 - 2000). At then end of the run, I > issue a flush() call. The problem is that the flush call never seems > to do anything. Every 10000 additions to the database and the library > performs a flush (which can take up to 3 hours on a 560,000 document > database) as if my flush call was never performed.Not that I have a solution, but I have a similar problem with my Xapian database. (doccount 8millions) flushtime is fairly long (over 10 minutes on a 16 SAS disk array for 1000 documents added) and monitoring vmstat (and top) I can see that it neither saturates 1 cpu or anything near the block input/output that the disk can deliver (uses around 5MB/s in block/in and out), viewing "top" only around 8-12% IO wait. All of above is measured when Xapian is "flushing". Still running xapian 1.0.4 (with perl-bindings) -- Jesper _______________________________________________ Xapian-discuss mailing list Xapian-discuss@lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-discuss
On Sun, Jan 20, 2008 at 01:32:19PM -0500, Michael A. Lewis wrote:> I am having a problem with flushing a database. I am adding N > records to the DB (which amounts to 1 - 2000). At then end of the > run, I issue a flush() call. The problem is that the flush call > never seems to do anything. Every 10000 additions to the database > and the library performs a flush (which can take up to 3 hours on a > 560,000 document database) as if my flush call was never performed. > > 1) This seems entirely too long, is it?Sounds high to me, but it depends on so many factors: number of terms, size of document data, available memory, how much memory is used by Xapian to hold the 10k documents before flushing, logical to physical volume layout, file systems involved... What are you seeing as the main activity during flush? If you're on a Unix machine it'll probably be one of system, user or iowait.> 2) Why would my flush be ignored (no tranactions being used, just > straight add using the term generator). > > This is my flush code: > > try { > Xapian::WritableDatabase* database = getDatabase(); > database->flush(); > } catch (const Xapian::Error & err) { > s="ERROR:"+err.get_msg(); > log_it(s.c_str()); > write(c_id,"ERROR:-3",8); > } > return;Assuming that getDatabase() implements the Singleton pattern correctly, that you aren't clearing its instance, and that you aren't using threading (or if you are you know what you're doing with Singleton), this is odd. I've had a quick look over the flint code, and I can't see how it could not be working for you. If you compile with --enable-log and then run with XAPIAN_DEBUG_LOG set to a file, and XAPIAN_DEBUG_FLAGS set to -1, you'll get (lots!) of messages. You particularly should get an apply call from flint after your flush; if you don't, it's not working for some reason. Everything will be slower with debugging on, of course. J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
On Sun, Jan 20, 2008 at 01:32:19PM -0500, Michael A. Lewis wrote:> I am having a problem with flushing a database. I am adding N records > to the DB (which amounts to 1 - 2000). At then end of the run, I issue > a flush() call. The problem is that the flush call never seems to do > anything. Every 10000 additions to the database and the library > performs a flush (which can take up to 3 hours on a 560,000 document > database) as if my flush call was never performed.What result do you get if you open a separate Xapian::Database on the database you are writing to just after calling flush() explicitly, and call Database::get_doccount()? If this doesn't change until 10000 documents have been indexed, something is wrong with WritableDatabase::flush() (though the code looks correct to me too, and I'm sure we have testcases for this). If it is changing, something odd still seems to be happening, but it's not that flush() is just ignored. An explicit flush() should also reset the counter for the automatic flush. Cheers, Olly