Hello James, Parth, Following our discussion on IRC and on code review, the way FeatureVector class works needs some discussion. Presently, the FeatureVector class is defined as follows, with a fixed number of feature count (19): class FeatureVector::Internal : public Xapian::Internal::intrusive_base{ friend class FeatureVector; double label; double score; std::map<int,double> fvals; int fcount; Xapian::docid did; The two approaches that were discussed were: 1. Using enums as IDs for features in fvals. 2. Making fvals into a configurable vector of feature values. The issues were that the first way would still assume an order in which the features occur, and the second way would require the feature generation code to be changed into lots of little classes, which might be an overhead right now but would be a good functionality to have in future. What would be the best approach here? -- ---------------------------------------------------------------------------- Kind Regards, Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn <https://in.linkedin.com/in/ayushtomar> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160627/a2341582/attachment.html>
Hi Ayush Thanks for bringing up the issue for discussion. It is still possible to use feature IDs with Enums without the order. It is just we are defining in a way we need. Usually a good approach is to group features with some similarity e.g. term-document scores based features such as BM25 score, LM score etc are in a separate group with a specific ID range. The addition of new features can extend the present range or can be accommodated in the present range. The rankers will rank a particular instance with the present features (not necessarily, all and in order). In fact, a user can specify which features s/he wants to work with and the feature manager will ensure calculation of them and update 'fvals'. I am still missing some bits on the second approach, can you please give a little more information on it? Cheers Parth On Mon, Jun 27, 2016 at 5:46 PM, Ayush Tomar <ayushtomar at gmail.com> wrote:> Hello James, Parth, > > Following our discussion on IRC and on code review, the way FeatureVector > class works needs some discussion. > > Presently, the FeatureVector class is defined as follows, with a fixed > number of feature count (19): > > class FeatureVector::Internal : public Xapian::Internal::intrusive_base{ > friend class FeatureVector; > double label; > double score; > std::map<int,double> fvals; > int fcount; > Xapian::docid did; > > The two approaches that were discussed were: > 1. Using enums as IDs for features in fvals. > 2. Making fvals into a configurable vector of feature values. > > The issues were that the first way would still assume an order in which > the features occur, and the second way would require the feature generation > code to be changed into lots of little classes, which might be an overhead > right now but would be a good functionality to have in future. > > What would be the best approach here? > -- > > ---------------------------------------------------------------------------- > Kind Regards, > Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn > <https://in.linkedin.com/in/ayushtomar> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160627/a99efd7f/attachment.html>
Hi Parth, James might have something to say on the second approach. It wasn't discussed in detail and I don't completely understand how things will work here without having some sort of serialisation. On Mon, Jun 27, 2016 at 6:08 PM, Parth Gupta <pargup8 at gmail.com> wrote:> Hi Ayush > > Thanks for bringing up the issue for discussion. It is still possible to > use feature IDs with Enums without the order. It is just we are defining in > a way we need. Usually a good approach is to group features with some > similarity e.g. term-document scores based features such as BM25 score, LM > score etc are in a separate group with a specific ID range. The addition of > new features can extend the present range or can be accommodated in the > present range. > > The rankers will rank a particular instance with the present features (not > necessarily, all and in order). In fact, a user can specify which features > s/he wants to work with and the feature manager will ensure calculation of > them and update 'fvals'. > > I am still missing some bits on the second approach, can you please give a > little more information on it? > > Cheers > Parth > > > On Mon, Jun 27, 2016 at 5:46 PM, Ayush Tomar <ayushtomar at gmail.com> wrote: > >> Hello James, Parth, >> >> Following our discussion on IRC and on code review, the way FeatureVector >> class works needs some discussion. >> >> Presently, the FeatureVector class is defined as follows, with a fixed >> number of feature count (19): >> >> class FeatureVector::Internal : public Xapian::Internal::intrusive_base{ >> friend class FeatureVector; >> double label; >> double score; >> std::map<int,double> fvals; >> int fcount; >> Xapian::docid did; >> >> The two approaches that were discussed were: >> 1. Using enums as IDs for features in fvals. >> 2. Making fvals into a configurable vector of feature values. >> >> The issues were that the first way would still assume an order in which >> the features occur, and the second way would require the feature generation >> code to be changed into lots of little classes, which might be an overhead >> right now but would be a good functionality to have in future. >> >> What would be the best approach here? >> -- >> >> ---------------------------------------------------------------------------- >> Kind Regards, >> Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn >> <https://in.linkedin.com/in/ayushtomar> >> > >-- ---------------------------------------------------------------------------- Kind Regards, Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn <https://in.linkedin.com/in/ayushtomar> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160627/8d6e9f8e/attachment-0001.html>