Jeroen Vaes
2010-Mar-24 15:48 UTC
[Xapian-discuss] Omega: behavior msize when collapsing results
Hello list, I have a problem with the value of the result size ($msize in omegascript) when collapsing results. The index contains 151452 documents. I'm using Omega 1.0.18 on FreeBSD (I tried both the version in ports and the latest one from xapian.org). This is my indexscript: uniqueid: boolean=Q unique=Q field=uniqueid objectid: field=objectid boolean=XID value=0 objecttype: field=type boolean=XTYPE language: field=language boolean=L title: field=title index content: index catalog: field=catalog boolean=XCATALOG number: field=number searchnumber: field=searchnumber boolean=XNUMBER indexnopos productgroup: field=productgroup boolean=XPRODUCTGROUP property: field=property boolean=XPROPERTY colour: field=colour boolean=XCOLOUR size: field=size boolean=XSIZE colourandsize: field=size boolean=XCOLOURANDSIZE norm: field=norm boolean=XNORM picture: field=picture sort: valuenumeric=1 field=sort icon: field=icon boolean=XICON preview: unhtml truncate=200 field=preview My Omega command looks like this: FMT=xml DEFAULTOP=OR HITSPERPAGE=9 MINHITS=900 SORT=1 SORTREVERSE=1 COLLAPSE=0 B=LNL B=XTYPEproduct P='(catalog:2 OR catalog:425) AND productgroup:6' So in plain English, I am requesting all products from catalogs 2 and 425 in productgroup 6. I am collapsing the result on field 'objectid' and sorting on field 'sort'. The expected number of results is 418, without collapsing this is 441. In order to get an exact number of matches, I set the MINHITS parameter to 900. However, when I run this query, $msize is 439. The value of $msizeexact is "true", so it appears to be not estimated. However, when I request the last result page, the value of $msize is reduced to 418. Now, when I set HITSPERPAGE to 1, the value of $msize is 441 (so the number of documents before collapsing). Again, when requesting the last result page, the value of $msize is 418. And again, in both cases the value of $msizeexact is "true". When I set HITSPERPAGE to 1000, the value of $msize is 418. So, it would seem that $msize does not take into account the collapsing of documents. However, I did some digging in the Omega code, and it seems $msize is the value of Xapian::MSet::get_matches_estimated(), and according to the API documentation, "This figure takes into account collapsing of duplicates, and weighting cutoff values.". I also have a smaller index (83937 documents) which uses the same script and the same kind of data, and there $msize is always correct. So, what causes this behavior? Is this correct (in that case it would seem that the API documentation is wrong), or did I encounter some weird bug? And does anyone have a solution? Kind regards, Jeroen Vaes
Olly Betts
2010-Mar-25 15:07 UTC
[Xapian-discuss] Omega: behavior msize when collapsing results
On Wed, Mar 24, 2010 at 04:48:41PM +0100, Jeroen Vaes wrote:> So, what causes this behavior? Is this correct (in that case it would > seem that the API documentation is wrong), or did I encounter some weird > bug? And does anyone have a solution?It sounds like a bug. We've had a few in this area before, and getting these statistics right under all the possible combinations of options while doing various optimisations is somewhat tricky. I believe we've backported all the relevant fixes to 1.0.x, but it would be useful to check with 1.1.4 to see if that exhibits the same issue in case we missed one. If it fails with 1.1.4, then we really need a testcase - unfortunately it's hard to track down such bugs without one. Is the dataset something you can make available, and small enough to do sensibly do so? Cheers, Olly