Sean McCleary
2011-Jan-04 19:51 UTC
[Xapian-discuss] Excessive memory use when using FLAG_PARTIAL?
Hi everyone, Sorry if this is an easy one, but I've Googled and can't find anyone else who's mentioned this same problem. I'm using Xapian (tried both versions 1.0.17 and 1.2.4) with the PHP bindings on Ubuntu 10.04 (Lucid) and Apache 2.2.14. I'm using it for an "auto-complete" in the search form on a web page. But whenever I use FLAG_PARTIAL on my search, the memory usage of the apache process quickly balloons up to almost 100% of the available memory resources, and hangs there in "Sending reply" status. The execution of the PHP script finishes, but the apache process is stuck, and consuming almost all the available memory. I've found that when I remove the "FLAG_PARTIAL" flag from my query, this problem does not happen. Is this expected behavior? The server this is running on has 512 MB of memory. My Xapian index is only 108 MB in size. Any help would be greatly appreciated. Thanks, Sean
Olly Betts
2011-Jan-11 12:41 UTC
[Xapian-discuss] Excessive memory use when using FLAG_PARTIAL?
On Tue, Jan 04, 2011 at 11:51:16AM -0800, Sean McCleary wrote:> I'm using Xapian (tried both versions 1.0.17 and 1.2.4) with the PHP > bindings on Ubuntu 10.04 (Lucid) and Apache 2.2.14. I'm using it for an > "auto-complete" in the search form on a web page. But whenever I use > FLAG_PARTIAL on my search, the memory usage of the apache process quickly > balloons up to almost 100% of the available memory resources, and hangs > there in "Sending reply" status. > > The execution of the PHP script finishes, but the apache process is stuck, > and consuming almost all the available memory. > > I've found that when I remove the "FLAG_PARTIAL" flag from my query, this > problem does not happen. > > Is this expected behavior? The server this is running on has 512 MB of > memory. My Xapian index is only 108 MB in size.FLAG_PARTIAL currently just expands the partial word at the end of the query to all the possible completions, so if the partial word is short this can generate a query with a lot of terms (particularly when the partial word is just a single common character, such as 's' in English). Each term in the query needs a certain amount of memory, regardless of the size of the database on disk - judging by the figures in another recent post to the list, this is something like 55KB currently, so if the partial word expands to 10000 or more terms, the process size will grow to more than the size of your physical memory. My guess would be that this is the cause of your problem. The memory overhead per term could probably be reduced, but actually it's probably not useful to expand such short partial terms - a search for all words starting with the same letter is just going to be too noisy to be useful, regardless of the resources it would need. So my thought would be to add a minimum length for the partial words which will be expanded under FLAG_PARTIAL, and probably a way to specify this via the API. Cheers, Olly