Olly Betts
2007-Mar-06 01:49 UTC
[Xapian-devel] Merging stats from multiple databases for expand
In matcher/expandweight.cc we have: OmExpandBits operator+(const OmExpandBits &bits1, const OmExpandBits &bits2) { OmExpandBits sum(bits1); sum.multiplier += bits2.multiplier; sum.rtermfreq += bits2.rtermfreq; // FIXME - try to share this information rather than pick half of it if (bits2.dbsize > sum.dbsize) { DEBUGLINE(WTCALC, "OmExpandBits::operator+ using second operand: " << bits2.termfreq << "/" << bits2.dbsize << " instead of " << bits1.termfreq << "/" << bits1.dbsize); sum.termfreq = bits2.termfreq; sum.dbsize = bits2.dbsize; } else { DEBUGLINE(WTCALC, "OmExpandBits::operator+ using first operand: " << bits1.termfreq << "/" << bits1.dbsize << " instead of " << bits2.termfreq << "/" << bits2.dbsize); // sum already contains the parts of the first operand } return sum; } Why don't we "share this information" by just replacing the "if" by: sum.termfreq += bits2.termfreq; sum.dbsize += bits2.dbsize; Am I missing some subtlety here? I looked at the history of this code, and the essence of this is unchanged since it was first checked in back in 1999 (and looking at that commit, it doesn't look like this code came from another file): http://svn.xapian.org/trunk/xapian-core/matcher/expandweight.cc?view=markup&pathrev=532 Cheers, Olly
Richard Boulton
2007-Mar-06 09:19 UTC
[Xapian-devel] Merging stats from multiple databases for expand
Olly Betts wrote:> In matcher/expandweight.cc we have: > > OmExpandBits > operator+(const OmExpandBits &bits1, const OmExpandBits &bits2) > { > OmExpandBits sum(bits1); > sum.multiplier += bits2.multiplier; > sum.rtermfreq += bits2.rtermfreq; > > // FIXME - try to share this information rather than pick half of it > if (bits2.dbsize > sum.dbsize) { > DEBUGLINE(WTCALC, "OmExpandBits::operator+ using second operand: " << > bits2.termfreq << "/" << bits2.dbsize << " instead of " << > bits1.termfreq << "/" << bits1.dbsize); > sum.termfreq = bits2.termfreq; > sum.dbsize = bits2.dbsize; > } else { > DEBUGLINE(WTCALC, "OmExpandBits::operator+ using first operand: " << > bits1.termfreq << "/" << bits1.dbsize << " instead of " << > bits2.termfreq << "/" << bits2.dbsize); > // sum already contains the parts of the first operand > } > return sum; > } > > Why don't we "share this information" by just replacing the "if" by: > > sum.termfreq += bits2.termfreq; > sum.dbsize += bits2.dbsize; > > Am I missing some subtlety here?I don't think so - I think changing the code to do that would be fine.> I looked at the history of this code, and the essence of this is unchanged > since it was first checked in back in 1999 (and looking at that commit, it > doesn't look like this code came from another file):Going that far back, I think it was probably a case of getting it to work for the single database case, and worrying about the multiple database case later. I can't think now why I'd have thought that a simple approach like that wouldn't work. -- Richard