On Thu, Feb 27, 2014 at 10:25:32PM +0530, Ajay Chatterjee
wrote:> I have started going through the resources given alongside the project
> description in the Ideas page. I wanted to ask about the next steps to
> proceed, I mean whether there are any issues with the existing Letor API,
> as it would have helped me get familiar to the codebase faster.
Perhaps the biggest issue is that there's currently no automated
testsuite, which means it's hard to be sure it all works as intended,
and if changes are made then a lot of manual testing is needed before
we can be confident that new bugs haven't been introduced. The rest of
Xapian has testsuites (xapian-core in particular has an extensive test
suite, which has helped to keep the bug count low).
There are a few things about the API that could definitely do with
attention, such as methods like letor_score() returning a std::map<>,
which will probably end up with the return value getting copied.
I'm not really clear why so many methods are just commented out
currently (at least in Rishabh's tree). That really needs sorting
out.
Parth - do you know what the plan was there?
Assuming they get reintroduced, the slew of calculate_f1(), etc methods
don't seem helpfully named to me, and the need to pass different
parameters to each seems unhelpful to - if I was trying to write code
using them, I can imagine I'd spend a lot of time consulting the
documentation to remind myself what feature number N is and what
parameters I need to pass it. Perhaps we want a "stats" object to
carry round all those stats, and a method more like:
f = letor.calculate_feature(Xapian::Letor::FEATURE_SOME_NAME,
letor_stats,
Xapian::Letor::FIELD_TITLE);
The letor API should also really use the standard Xapian types
(currently "long int" is used in various places which should really be
using Xapian::termcount or another appropriate Xapian type). Once
you've got the code working, this might be a good place to start to
get yourself more familiar with it. Changing the types in the API
is likely to require equivalent internal changes, so you'll get to
see what's going on inside too.
Cheers,
Olly