> > > > Actually, you're doing something slightly unusual there: making the > internal member public. Protected would be better, and private is I think > most usual; library clients aren't going to have access to the Internal > class declaration, so they can't call things on it. This means it's > actually difficult right now to subclass Feature. > > I don't think we should worry about this with LTR right now, but I suspect > a better approach is to pass the FeatureList (as a helper) to > Feature::get_values(), or set it when we add the Feature to FeatureList, > and use that to ask for things like the current query, current document, > normalised lengths &c. >I see. What I've put in Feature::Internal can actually be part of FeatureList class (maybe as FeatureList::Internal). For the time being, I'll at least make the internal member in Feature as protected, and we can take this up later when we are getting rid of Letor class and putting everything under Ranker.> > It's possible that we could also do the thing Weight does, and have each > Feature report which statistics it needs, so FeatureList can calculate them > each once in advance before calling get_values() on all its Feature > objects. (Weight objects I think get the statistics poked onto them, but >This is a good idea. (words after "but" are somehow missing, could you please resend?)> > J > > --Ayush -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160818/95532217/attachment.html>
Hello, Ayush, thanks for that link to your code. It helped me understand how I could implement things much quicker! I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class. The PR failure is because of the old tests which I had written for testing the old API. I'll have to write completely new tests because the API has changed dramatically after mid terms. I'll get that fixed soon by writing a few test cases for the newly implemented functionality. Currently, the main classes which have data to hide are Cluster, ClusterSet and Clusterer subclass (currently KMeans). Thus, if we can use PIMPL with these classes, it could hide quite a lot of the non-public data. As a plus point, I'm having problems with shifting PointType, Point and Centroid classes to a private header because of forward referencing problems, so even these problems can be solved if KMeans is to go with PIMPL. According to me, it doesn't even affect the interface that the Clusterer provides because its just an interface. We can still hook up newer algorithms, which may not want to use PIMPL later. (I maybe wrong here) Current update - I'll have the PR in good shape by tomorrow by fixing the travis build and adding in a little more PIMPL. Thanks. Regards, Richhiey -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160819/3c0a3889/attachment.html>
On 18 Aug 2016, at 21:07, Ayush Tomar <ayushtomar at gmail.com> wrote:>> It's possible that we could also do the thing Weight does, and have each Feature report which statistics it needs, so FeatureList can calculate them each once in advance before calling get_values() on all its Feature objects. (Weight objects I think get the statistics poked onto them, but > > This is a good idea. > (words after "but" are somehow missing, could you please resend?)Not sure I ever wrote them, and can't remember now what I wanted to say. Not sure there is a "but"; that approach should be considered too. J -- James Aylett, +44 7976 212023 Principal Consultant, Kittens with Jetpacks Ltd Registered company no 6587395 19 Cottrell Court, Southern Way, London SE10 0DW
On 18 Aug 2016, at 23:59, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:> I've currently added a few classes which don't really belong to the public API (currently) into private headers and used PIMPL with the Cluster class.I'm having difficulty reading your changes, because you aren't keeping to one complete change per commit. So for instance you've added a new Cluster::Internal class in one commit, but started using it in a different commit in which you also make a number of other changes. This makes it much harder to see whether there are problems in what you've done. (Similarly, you add new similarity classes in one commit, and in the jumbo commit remove them from where they came from and start using the new ones, possibly.) You also still haven't integrated Clusterer fully; RoundRobin doesn't inherit from it. There may be other similar issues left unaddressed, but I don't have time right now to work through the jumbo commit and match it against the work in the other commits.> The PR failure is because of the old tests which I had written for testing the old API. I'll have to write completely new tests because the API has changed dramatically after mid terms. I'll get that fixed soon by writing a few test cases for the newly implemented functionality.It's also because cluster.cc won't compile under clang because of an equality / assignment error, and under gcc because there are warnings about members shadowed by local variables. You should have got one or the other while compiling yourself; what version or which compiler are you using?> Currently, the main classes which have data to hide are Cluster, ClusterSet and Clusterer subclass (currently KMeans). Thus, if we can use PIMPL with these classes, it could hide quite a lot of the non-public data. As a plus point, I'm having problems with shifting PointType, Point and Centroid classes to a private header because of forward referencing problems, so even these problems can be solved if KMeans is to go with PIMPL.As I think I've said before, you should concentrate on getting the PR building and in good shape before going further with PIMPL or similar. An API that has extra bits but works and is tested is much better than a tidier API that isn't feature complete. You have actually tidied up the KMeans API considerably just by moving some internal methods out of public visibility. Although in future we may be able to hide them completely in a private implementation, it's neater now and more obvious how to use the class.> According to me, it doesn't even affect the interface that the Clusterer provides because its just an interface. We can still hook up newer algorithms, which may not want to use PIMPL later. (I maybe wrong here)Because Clusterer doesn't use PIMPL, users are free to subclass it freely. (And use PIMPL or not.) Olly's concern on IRC was that using PIMPL for KMeans would prevent subclassing that. At the moment that's probably moot, because it's not currently structured to make subclassing it practical anyway. Even once you have a green PR that's ready to merge and are looking for further things to work on, I wouldn't worry too much about removing PointType, Point & Centroid from the public API. I have a feeling that when we look at further ways of initialising the KMeans centroids we'll want to make that a public API rather than a selector (mode="random", mode="kmeanspp") as it is now. That will involve exposing some part of that set of classes to make that API. (There are a number of ways of initialising them in the literature already, and it would be nice to allow users to experiment with their own.) J -- James Aylett devfort.com ? spacelog.org ? tartarus.org/james/