On 4 January 2011 13:51, Charlie Hull <charlie at juggler.net>
wrote:> We've had a customer report some issues recently which turned out to be
> memory related, which we thought might be of interest.
>
> Manually chopping the query roughly in half shows a peak memory requirement
> of 1.5GB, which was survivable in this case - and the memory requirement
> then goes down again indicating no actual memory leaks. We're going to
> implement a maximum query length to protect against future problems, but we
> thought people might be interested in this admittedly unlikely scenario. I
> wonder if it might be easy to graph query length against memory required?
I think the easiest way to do this would be to chop the query in half
a few more times, and measure the memory requirement manually.
Shouldn't take many points to see the shape. Getting the memory
requirement automatically is a bit messy; unless windows provides some
mechanism for doing this, you'd probably have to hook something into
Xapian with a preload hack (if there's an equivalent to this on
windows), or run a process at the same time as the searches to keep
track of how much memory is in use each second.
I'd expect the memory usage to be linear in the number of terms: for
an OR query, they're put in a binary tree structure when evaluating,
but the number of nodes in such a tree is still O(number of leaf
nodes) - ie, O(number of terms). Each term will result in a posting
iterator, which might be quite memory hungry (since each will have
several blocks of the DB file open), but I'd expect a query of 6000
terms to maybe use 100MB or so.
--
Celestial Navigation Limited, incorporated in England & Wales
(registration number 06978117), registered office address: 58
Kingsway, Duxford, Cambridgeshire, CB224QN, UK.