Jeff Rand
2013-Jul-03  19:59 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
I've traced a memory leak to a statement which assigns the values from an
MSetItem to a dictionary which is then appended to a list in python. We're
running python 2.7.3, xapian-core 1.2.15 and xapian-bindings 1.2.15. I've
provided an example which reproduces the behavior below. The example prints
the PID and has a few statements waiting for input to make observing the
behavior easier.
Run the following code and monitor the PID's memory usage in top or a
similar program. I've observed the resident memory for this example go from
18m to 52m after deleting objects and running garbage collection.
I think the MSetItems are preserved in memory and are not being garbage
collected correctly, possibly from a lingering reference to the MSet or
MSetIterator.
import os
import simplejson as json
import xapian as x
import shutil
import gc
def make_db(path, num_docs=100000):
    try:
        shutil.rmtree(path)
    except OSError, e:
        if e.errno != 2:
            raise
    db = x.WritableDatabase(path, x.DB_CREATE)
    for i in xrange(1, num_docs):
        doc = x.Document()
        doc.set_data(json.dumps({ 'id': i, 'enabled': True }))
        doc.add_term('XTYPA')
        db.add_document(doc)
    return db
def run_query(db, num_docs=100000):
    e = x.Enquire(db)
    e.set_query(x.Query('XTYPA'))
    m = e.get_mset(0, num_docs, True, None)
    # Store the MSetItem's data, which causes a memory leak
    data = []
    for i in m:
        data.append({ 'data': i.document.get_data(), 'id':
i.docid, })
    # Make sure I'm not crazy
    del num_docs, db, i, e, m, data
    gc.collect()
def main():
    # print the PID to monitor
    print 'PID to monitor: {}'.format(os.getpid())
    db = make_db('/tmp/test.db')
    raw_input("database is done, ready?")
    run_query(db, 100000)
    raw_input('done?')
if __name__ == '__main__':
    main()
Olly Betts
2013-Jul-05  10:18 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
On Wed, Jul 03, 2013 at 03:59:21PM -0400, Jeff Rand wrote:> Run the following code and monitor the PID's memory usage in top or a > similar program. I've observed the resident memory for this example go from > 18m to 52m after deleting objects and running garbage collection.If I set it to repeat the call to run_query 10 times, the memory usage doesn't keep growing, so it looks to me like the heap of the process has just grown, and doesn't get returned to the OS again. Certainly the number of objects Python knows about is constant (add a call to print len(gc.get_objects()) after gc.collect() to see that). I was using Python 2.6.6 and Xapian trunk, as I have those to hand. This could be version dependent of course - can you try repeating run_query() to see if the process size keeps growing for you? Cheers, Olly
Jeff Rand
2013-Jul-10  15:51 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
Olly, the process size does stay constant with the results from one query set, but running other queries will cause it to grow (once). Is it possible that this is a bug with the SWIG python bindings? On Fri, Jul 5, 2013 at 6:18 AM, Olly Betts <olly at survex.com> wrote:> On Wed, Jul 03, 2013 at 03:59:21PM -0400, Jeff Rand wrote: > > Run the following code and monitor the PID's memory usage in top or a > > similar program. I've observed the resident memory for this example go > from > > 18m to 52m after deleting objects and running garbage collection. > > If I set it to repeat the call to run_query 10 times, the memory usage > doesn't keep growing, so it looks to me like the heap of the process > has just grown, and doesn't get returned to the OS again. > > Certainly the number of objects Python knows about is constant (add a > call to print len(gc.get_objects()) after gc.collect() to see that). > > I was using Python 2.6.6 and Xapian trunk, as I have those to hand. > This could be version dependent of course - can you try repeating > run_query() to see if the process size keeps growing for you? > > Cheers, > Olly >