Using the python bindings, I'm having some interesting results with faceting on an enquire with collapse key set. The issue is that the ValueCountMatchSpy results seem to tally the uncollapsed set, not the collapsed set. Any way around this? Can provide example code if necessary, but I can reliably produce the correct number of facets against the correct results set by toggling set_collapse_key, and I have verified that the inspection size is the doc count, to ensure no estimation is occurring. Thanks. -- regards, matt
Someone else may be able to confirm this, but I think the problem is that MatchSpy instances get called very low down in the matcher stack, while collapsing happens at a much higher level. Given that collapsing is relatively simple, it should be pretty easy to provide an alternative to ValueCountMatchSpy which takes collapsing into account as well. If you wanted to make the alterations to the existing implementation, it'd be great if you could contribute them back under suitable licenses (see HACKING) so others can benefit. It would probably make sense for us only to have one implementation, with collapsing support being an option ? but others should weigh in on this, as a maybe a better way to present this. If C++ isn't your thing, you may have to wait for someone to have time ? unless someone can come up with another way of tackling this. James On 7 Oct 2012, at 06:11, Matthew Story <matthewstory at gmail.com> wrote:> Using the python bindings, I'm having some interesting results with > faceting on an enquire with collapse key set. The issue is that the > ValueCountMatchSpy results seem to tally the uncollapsed set, not the > collapsed set. > > Any way around this? Can provide example code if necessary, but I can > reliably produce the correct number of facets against the correct results > set by toggling set_collapse_key, and I have verified that the inspection > size is the doc count, to ensure no estimation is occurring. > > Thanks. > > -- > regards, > matt > _______________________________________________ > Xapian-discuss mailing list > Xapian-discuss at lists.xapian.org > http://lists.xapian.org/mailman/listinfo/xapian-discuss-- James Aylett, occasional trouble-maker xapian.org
On Sun, Oct 7, 2012 at 2:23 PM, Matthew Story <matthewstory at gmail.com> wrote:> On Sun, Oct 7, 2012 at 12:48 PM, James Aylett <james-xapian at tartarus.org> wrote: >> >> Someone else may be able to confirm this, but I think the problem is that MatchSpy instances get called very low down in the matcher stack, while collapsing happens at a much higher level. Given that collapsing is relatively simple, it should be pretty easy to provide an alternative to ValueCountMatchSpy which takes collapsing into account as well. >> >> If you wanted to make the alterations to the existing implementation, it'd be great if you could contribute them back under suitable licenses (see HACKING) so others can benefit. It would probably make sense for us only to have one implementation, with collapsing support being an option ? but others should weigh in on this, as a maybe a better way to present this. >> >> If C++ isn't your thing, you may have to wait for someone to have time ? unless someone can come up with another way of tackling this. > > Don't mind submitting a patch if the behavior is indeed undesirable. > Question is, what is the right approach to resolving this. Should the > ValueCountMatchSpy be provided with the ability to ignore or respect > collapse, and then internally to itself track the collapse state based > on a collapse key provided to operator? Or should the > ValueCountMatchSpy termfreq be decremented by MatchDecider at the > collapse phase. > > Relatively new user, so not sure as to the best path (in line with the > design of the project, and with regards to efficiency) towards > resolution. > > Suggestions? > [...snip]Replying to list, with apologies to James. -- regards, matt