Hi all, There's been a odd bug reported to us by Daniel Menard while working on the PHP bindings: "I then tried to run the dotest target...All tests passed, except the one about get_matching_terms (smoketest.php line 94). I added this line before the exit: for ($i=0; $i<strlen($terms); $i++) echo $c=ord($terms[$i]), ' ', ($c>31?$terms[$i]:''), "\n"; and it appears that the first letter of each term is replaced with a null char.I tried to run the same test in a debian box, and the test pass, so perhaps this is a windows-related problem (more on this below). ...[he then manages to get Xapian working in a real situation]... I was surprised it works so well because my script also uses get_matching_terms, but it doesn't reproduce the bug above. In fact, I don't use a "join(get_matching_terms())" as smoketest do, but iterate with get_matching_terms_begin and get_matching_terms_end. Just by curiosity, I added the following lines in smoketest.php : $hit=$mset->get_hit(0); $it=$enq->get_matching_terms_begin($hit); while (! $it->equals($enq->get_matching_terms_end($hit))) { echo $it->get_term(), ' '; $it->next(); } and with that code, we get the correct terms. So the bug only concerns the way get_matching_terms is wrapped (and only appears under windows)... strange. " Anyway, I thought it might be worth raising in case anyone with a better knowledge of PHP might have a brainwave! Cheers Charlie
On Tue, Apr 03, 2007 at 03:22:14PM +0100, Charlie Hull wrote:> and with that code, we get the correct terms. So the bug only concerns > the way get_matching_terms is wrapped (and only appears under > windows)... strange.How odd. What if you take Xapian out of the equation and run something like: <? function foo() { return array("is", "there"); } $terms = join(" ", foo()); for ($i=0; $i<strlen($terms); $i++) echo $c=ord($terms[$i]), ' ', ($c>31?$terms[$i]:''), "\n"; ?> Cheers, Olly
On Tue, Apr 03, 2007 at 03:22:14PM +0100, Charlie Hull wrote:> So the bug only concerns the way get_matching_terms is wrapped (and > only appears under windows)... strange. > > Anyway, I thought it might be worth raising in case anyone with a better > knowledge of PHP might have a brainwave!get_matching_terms() isn't wrapped - it's a synthetic method generated only in the SWIG layer. Note that it relies on a typemap to convert std::pair<> into a list in the target language, which is done at the bottom of php/util.i - I'm wondering if there's a Windows-specific bug in that in some way? I don't know the PHP internals enough to know for sure, and I'm a little short on time right now or I'd prod further myself... J -- /--------------------------------------------------------------------------\ James Aylett xapian.org james@tartarus.org uncertaintydivision.org
Charlie Hull wrote:> I was surprised it works so well because my script also uses > get_matching_terms, but it doesn't reproduce the bug above. > In fact, I don't use a "join(get_matching_terms())" as smoketest do, but > iterate with get_matching_terms_begin and get_matching_terms_end. > Just by curiosity, I added the following lines in smoketest.php : > $hit=$mset->get_hit(0); > $it=$enq->get_matching_terms_begin($hit); > while (! $it->equals($enq->get_matching_terms_end($hit))) > { > echo $it->get_term(), ' '; > $it->next(); > } > and with that code, we get the correct terms. So the bug only concerns > the way get_matching_terms is wrapped (and only appears under > windows)... strange. > "What were the terms involved in this test? Is it possible that there's an issue with character set conversion? Alternatively, it may be a memory management problem: the handling for get_matching_terms() is special-cases for PHP at the end of xapian-bindings/php/util.i, where there is code for implementing the special handling for term lists which allow a list containing the terms to be obtained. This works by copying each term into the list with "add_next_index_stringl". Perhaps this isn't copying the contents of the string correctly, or is failing to allocate space for the contents correctly. I've checked through the sources for my version of PHP, and it looks like the allocation should happen correctly - but there are many layers of code here, where a problem could be being hidden. In particular, PHP can use the native allocation routines, or its own memory allocation system. It would be interesting to try using the native allocation routines instead of the PHP one to check if there's a bug in its allocator. (I'm think you can turn off the PHP allocator by setting the "USE_ZEND_ALLOC" environment variable to "0"). -- Richard