john veter
2016-Feb-06 16:39 UTC
Xapian::WritableDatabase: commit changes depending on the buffer size
Hi. I have a lot of documents with different filesize. While indexing, i call commit() every 1000 documents (this is the default value, user can change it). The problem is the following: the indexing process runs smoothly while indexing small files. The indexer uses about half of the available RAM. But one moment it hits a bunch of bigger documents. As a result, the RAM usage increases drastically. Finally, i just run out of memory. I think that the solution is to call commit() depending on the buffer actual size (in megabytes), but not based on the number of the indexed documents. So, is there any way to estimate the size of the buffer of the Xapian::WritableDatabase object? P.S. may be somebody have other suggestions how to solve the problem?
Jean-Francois Dockes
2016-Feb-06 17:29 UTC
Xapian::WritableDatabase: commit changes depending on the buffer size
john veter writes: > Hi. I have a lot of documents with different filesize. While indexing, i call commit() every 1000 documents (this is the default value, user can change it). The problem is the following: the indexing process runs smoothly while indexing small files. The indexer uses about half of the available RAM. But one moment it hits a bunch of bigger documents. As a result, the RAM usage increases drastically. Finally, i just run out of memory. > > I think that the solution is to call commit() depending on the buffer actual size (in megabytes), but not based on the number of the indexed documents. So, is there any way to estimate the size of the buffer of the Xapian::WritableDatabase object? > > P.S. may be somebody have other suggestions how to solve the problem? I had the same issue quite a long time ago. I changed the indexer to flush after adding/updating/deleting a document, based on the total amount of input text, independantly of the number of documents. Not claiming that this is a rigorous solution, but it apparently solved the problem, at least nobody seems to complain about memory usage any more. Cheers, jf
Olly Betts
2016-Feb-14 11:23 UTC
Xapian::WritableDatabase: commit changes depending on the buffer size
On Sat, Feb 06, 2016 at 07:39:15PM +0300, john veter wrote:> I think that the solution is to call commit() depending on the buffer > actual size (in megabytes), but not based on the number of the indexed > documents. So, is there any way to estimate the size of the buffer of > the Xapian::WritableDatabase object?The problem is that we don't have a good way to determine the size of the buffered data as it is currently stored. There's a plan to change how it is stored, and one benefit will be that we will then know how much RAM it is using. I thought we had a ticket for this, but a quick look only found a couple for related issues.> P.S. may be somebody have other suggestions how to solve the problem?The approach already suggested by Jean-Francois seems plausible, and is at least simple to calculate. Cheers, Olly