maneesh deshpande
2005-Dec-28 02:54 UTC
[R] Regression with partial info about the dependent variable
Hi, I have the following problem which I would appreciate some help on. A variable y is to be modelled as a function of a set of variables Vector(x). The twist is that there is another variable z in the problem with the property that y(i) <= z(i). So the data set is divided into three categories I. y(i) = z(i) II. Both y(i) and z(i) are known and y(i) < z(i) III. y(i) is not known but z(i) is known ( But y(i) is guaranteed to be < z(i) ) The data in categories I + II can be satisfactorily modelled via a OLS regression of the form: y ~ Vec(x) The question is how to incorporate the information contained in the category III data? The category II data can be used to construct a model for y given z. Indeed log(z(i)-y(i)) is reasonably normal and so the following is a decent approximation: y(i) = z(i) + A*exp( N(0,1) ) This model can be improved by including Vec(x). After this I am not sure how to proceed :-( :-( Thanks in advance, Maneesh