Hi (I am using xapian 1.0.10, with perl bindings.) because of the issue, that xapian btrees thin out in the longrun, I decided to add support to my indexer for auto-compacting an index from time to time using the xapian-compact binary. It does so by * flushing the open index * undef the database-handle to do an inplicit close (there is no way to do an explicit close, right?) * running "xapian-compact -n --no-renumber ./index ./index-compact 2>&1 >/dev/null" * moving ./index -> ./index-old (* copying some arbitrary statistic files from ./index-old to ./index- compact, but this won't affect anything) * moving ./index-compact -> ./index * deleting ./index-old * reopening the ./index by calling the Search::Xapian::WritableDatabase->new() constructor Now my problem is, that the diskspace ./index-old consumes doesn't get freed. So I used lsof and found out that a "cat" process is holding open filehandles on the .DB files. cat 2072 root 36u REG 8,1 997842944 5529607 /var/lib/wtf/db/ profile-old/record.DB (deleted) cat 2072 root 38u REG 8,1 121257984 5529616 /var/lib/wtf/db/ profile-old/value.DB (deleted) cat 2072 root 39u REG 8,1 717463552 5529610 /var/lib/wtf/db/ profile-old/termlist.DB (deleted) cat 2072 root 40u REG 8,1 1703305216 5529613 /var/lib/wtf/db/ profile-old/position.DB (deleted) cat 2072 root 41u REG 8,1 1943666688 5529604 /var/lib/wtf/db/ profile-old/postlist.DB (deleted) this cat-process corresponds to my index-daemon root 1656 0.0 0.1 234732 54592 ? S 16:35 0:00 indexd overlord root 1657 18.6 0.3 342596 164416 ? R 16:35 5:13 \_ indexd root 1658 0.0 0.0 3868 468 ? S 16:35 0:00 \_ /bin/cat root 1659 0.0 0.0 3868 468 ? S 16:35 0:00 \_ /bin/cat root 1660 0.0 0.0 3868 468 ? S 16:35 0:00 \_ /bin/cat root 1661 0.0 0.0 3868 472 ? S 16:35 0:00 \_ /bin/cat root 1662 0.0 0.0 3868 468 ? S 16:35 0:00 \_ /bin/cat root 2072 0.0 0.0 3868 472 ? S 16:43 0:00 \_ /bin/cat root 3693 0.0 0.0 3868 472 ? S 17:01 0:00 \_ /bin/cat I read earlier that "cat" is used for locking, and I saw that its opening the flintlock files. But why does it hold these .DB file open? Is there a way to get these files closed properly without acutally quitting and restarting the process (which probably would by my workaround)? Regards, mrks
You must use: copydatabase then your Xapian Index will free unused disk space from deleted documents. /usr/local/bin/copydatabase directory/old directory/new Kevin Duraj http://myhealthcare.com/ On Tue, Feb 3, 2009 at 8:56 AM, Markus W?rle <mrks at mrks.de> wrote:> Hi > > (I am using xapian 1.0.10, with perl bindings.) > > because of the issue, that xapian btrees thin out in the longrun, I > decided to add support to my indexer for auto-compacting an index from > time to time using the xapian-compact binary. It does so by > > * flushing the open index > * undef the database-handle to do an inplicit close (there is no way > to do an explicit close, right?) > * running "xapian-compact -n --no-renumber ./index ./index-compact > 2>&1 >/dev/null" > * moving ./index -> ./index-old > (* copying some arbitrary statistic files from ./index-old to ./index- > compact, but this won't affect anything) > * moving ./index-compact -> ./index > * deleting ./index-old > * reopening the ./index by calling the > Search::Xapian::WritableDatabase->new() constructor > > Now my problem is, that the diskspace ./index-old consumes doesn't get > freed. So I used lsof and found out that a "cat" process is holding > open filehandles on the .DB files. > > cat 2072 root 36u REG 8,1 997842944 5529607 /var/lib/wtf/db/ > profile-old/record.DB (deleted) > cat 2072 root 38u REG 8,1 121257984 5529616 /var/lib/wtf/db/ > profile-old/value.DB (deleted) > cat 2072 root 39u REG 8,1 717463552 5529610 /var/lib/wtf/db/ > profile-old/termlist.DB (deleted) > cat 2072 root 40u REG 8,1 1703305216 5529613 /var/lib/wtf/db/ > profile-old/position.DB (deleted) > cat 2072 root 41u REG 8,1 1943666688 5529604 /var/lib/wtf/db/ > profile-old/postlist.DB (deleted) > > this cat-process corresponds to my index-daemon > > root 1656 0.0 0.1 234732 54592 ? S 16:35 0:00 > indexd overlord > root 1657 18.6 0.3 342596 164416 ? R 16:35 5:13 \_ > indexd > root 1658 0.0 0.0 3868 468 ? S 16:35 0:00 > \_ /bin/cat > root 1659 0.0 0.0 3868 468 ? S 16:35 0:00 > \_ /bin/cat > root 1660 0.0 0.0 3868 468 ? S 16:35 0:00 > \_ /bin/cat > root 1661 0.0 0.0 3868 472 ? S 16:35 0:00 > \_ /bin/cat > root 1662 0.0 0.0 3868 468 ? S 16:35 0:00 > \_ /bin/cat > root 2072 0.0 0.0 3868 472 ? S 16:43 0:00 > \_ /bin/cat > root 3693 0.0 0.0 3868 472 ? S 17:01 0:00 > \_ /bin/cat > > I read earlier that "cat" is used for locking, and I saw that its > opening the flintlock files. But why does it hold these .DB file open? > Is there a way to get these files closed properly without acutally > quitting and restarting the process (which probably would by my > workaround)? > > Regards, > mrks > > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss >
Am 10.02.2009 um 03:12 schrieb Kevin Duraj:> You must use: copydatabase then your Xapian Index will free unused > disk space from deleted documents.Thank you for your hint, but that does not help me on my problem. My problem is actually that I am not able to _close_ writable indices properly. With closing I mean "not having any open filehandle left in my process or any child-processes (like /bin/cat)", because disk space from deleted _files_ (on disk) gets freed not till then every single handle to it is closed. My issue is, that /bin/cat randomly (?, it seems so) remains with open files after closing. Regards, mrks
On Tue, Feb 03, 2009 at 05:56:15PM +0100, Markus W?rle wrote:> because of the issue, that xapian btrees thin out in the longrunI'm not sure I follow. Do you just mean that if you delete a lot of documents, you don't immediately get the space back? That's certainly true, but if you index more documents, that space should get reused. If the Btrees really end up becoming less efficiently used over time, I suspect that means there's a bug somewhere.> I read earlier that "cat" is used for locking, and I saw that its > opening the flintlock files. But why does it hold these .DB file open?Thanks for asking this question - I think there is a bug here. If I open WritableDatabase A then open WritableDatabase B, and then close and delete WritableDatabase A, then the locking process for B will still have open filehandles for the .DB files in A and so that disk space won't be released right away. After we fork() the locking process but before we exec() /bin/cat we should close all the fds apart from that for the pipe to our parent. I'll fix that, and backport the fix for 1.0.11.> Is there a way to get these files closed properly without acutally > quitting and restarting the process (which probably would by my > workaround)?In C++, you'd just make sure that you close the old db before opening the new one to workaround this bug. But in Perl the only way to close the old db is $db=undef; which you are already using. I guess the C++ object destructor may not get called right away. SVN trunk adds a Database::close() method, but that's probably not a great help to you currently. Cheers, Olly