Jeff Rand
2013-Jul-03 19:59 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
I've traced a memory leak to a statement which assigns the values from an MSetItem to a dictionary which is then appended to a list in python. We're running python 2.7.3, xapian-core 1.2.15 and xapian-bindings 1.2.15. I've provided an example which reproduces the behavior below. The example prints the PID and has a few statements waiting for input to make observing the behavior easier. Run the following code and monitor the PID's memory usage in top or a similar program. I've observed the resident memory for this example go from 18m to 52m after deleting objects and running garbage collection. I think the MSetItems are preserved in memory and are not being garbage collected correctly, possibly from a lingering reference to the MSet or MSetIterator. import os import simplejson as json import xapian as x import shutil import gc def make_db(path, num_docs=100000): try: shutil.rmtree(path) except OSError, e: if e.errno != 2: raise db = x.WritableDatabase(path, x.DB_CREATE) for i in xrange(1, num_docs): doc = x.Document() doc.set_data(json.dumps({ 'id': i, 'enabled': True })) doc.add_term('XTYPA') db.add_document(doc) return db def run_query(db, num_docs=100000): e = x.Enquire(db) e.set_query(x.Query('XTYPA')) m = e.get_mset(0, num_docs, True, None) # Store the MSetItem's data, which causes a memory leak data = [] for i in m: data.append({ 'data': i.document.get_data(), 'id': i.docid, }) # Make sure I'm not crazy del num_docs, db, i, e, m, data gc.collect() def main(): # print the PID to monitor print 'PID to monitor: {}'.format(os.getpid()) db = make_db('/tmp/test.db') raw_input("database is done, ready?") run_query(db, 100000) raw_input('done?') if __name__ == '__main__': main()
Olly Betts
2013-Jul-05 10:18 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
On Wed, Jul 03, 2013 at 03:59:21PM -0400, Jeff Rand wrote:> Run the following code and monitor the PID's memory usage in top or a > similar program. I've observed the resident memory for this example go from > 18m to 52m after deleting objects and running garbage collection.If I set it to repeat the call to run_query 10 times, the memory usage doesn't keep growing, so it looks to me like the heap of the process has just grown, and doesn't get returned to the OS again. Certainly the number of objects Python knows about is constant (add a call to print len(gc.get_objects()) after gc.collect() to see that). I was using Python 2.6.6 and Xapian trunk, as I have those to hand. This could be version dependent of course - can you try repeating run_query() to see if the process size keeps growing for you? Cheers, Olly
Jeff Rand
2013-Jul-10 15:51 UTC
[Xapian-discuss] Potential memory leak when assigning MSetItem values
Olly, the process size does stay constant with the results from one query set, but running other queries will cause it to grow (once). Is it possible that this is a bug with the SWIG python bindings? On Fri, Jul 5, 2013 at 6:18 AM, Olly Betts <olly at survex.com> wrote:> On Wed, Jul 03, 2013 at 03:59:21PM -0400, Jeff Rand wrote: > > Run the following code and monitor the PID's memory usage in top or a > > similar program. I've observed the resident memory for this example go > from > > 18m to 52m after deleting objects and running garbage collection. > > If I set it to repeat the call to run_query 10 times, the memory usage > doesn't keep growing, so it looks to me like the heap of the process > has just grown, and doesn't get returned to the OS again. > > Certainly the number of objects Python knows about is constant (add a > call to print len(gc.get_objects()) after gc.collect() to see that). > > I was using Python 2.6.6 and Xapian trunk, as I have those to hand. > This could be version dependent of course - can you try repeating > run_query() to see if the process size keeps growing for you? > > Cheers, > Olly >