This seems to be a recurrent topic, but I don't remember hearing a definitive answer. I also apologies for cross-posting. Say I have a numerical response variable and a bunch of multi-level factors I want to use for modeling. I don't expect factor interaction to be important so there will be no interactions in the model. All this would be a perfect job for ANOVA except for one additional requirement: all factors are ordered, i.e. for any two levels in a factor one is bigger than the other, and I want the regression to preserve the ordering, that is to be monotonic. I can think of two definitions of monotonicity, weak and strong, and in case it matters I am more interested in the strong one, which I define as follows: If L1 < L2 within some factor F than prediction(L1, other factors) < prediction(L2, other factors) for ANY SELECTION of 'other factors' levels. (A weak monotonicity could be defined as prediction(L1) < prediction(L2) on average). Any reference to S, R, C code / papers which address this topic will be highly appreciated. Similar question could be asked about additive models, i.e. estimate an additive model with each (or some) functions being monotonic. Thanks, Vadim -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 12 Mar 2001, Vadim Ogranovich wrote:> This seems to be a recurrent topic, but I don't remember hearing a > definitive answer. I also apologies for cross-posting. > > Say I have a numerical response variable and a bunch of multi-level factors > I want to use for modeling. I don't expect factor interaction to be > important so there will be no interactions in the model. > All this would be a perfect job for ANOVA except for one additional > requirement: all factors are ordered, i.e. for any two levels in a factor > one is bigger than the other, and I want the regression to preserve the > ordering, that is to be monotonic. > > I can think of two definitions of monotonicity, weak and strong, and in case > it matters I am more interested in the strong one, which I define as > follows: > If L1 < L2 within some factor F than > prediction(L1, other factors) < prediction(L2, other factors) > for ANY SELECTION of 'other factors' levels. > (A weak monotonicity could be defined as prediction(L1) < prediction(L2) on > average).in the absence of interactions these are equivalent, surely? One-dimensional monotonic regression is easy using the Pool-Adjacent-Violators algorithm. I think an additive monotonic model could be estimated by backfitting as for generalised linear models: update the function for each predictor in turn by using the one-dimensional algorithm on the partial residuals from the previous iteration. Backfitting is described in "Generalized Additive Models" by Hastie & Tibshirani (Chapman & Hall) and probably other places. The more difficult case is a non-additive model (ie interactions). For discrete factors there were two papers in the Journal of Computational & Graphical Statistics by Qian and colleagues. One of them was @article{qian:eddy:1996, Author = {Qian, Shixian and Eddy, William F.}, Title = {An Algorithm for Isotonic Regression on Ordered Rectangular Grids}, Year = 1996, Journal = {Journal of Computational and Graphical Statistics}, Volume = 5, Pages = {225--235}, Keywords = {[Block class algorithm]; [Monotone regression]; [Recursive partitioning]; [Sandwich algorithm]} }> Any reference to S, R, C code / papers which address this topic will be > highly appreciated.The following does one-dimensional isotonic regression. I believe similar code has been posted before. It is recursive but seems fast enough for many uses. It returns a compressed form of the result where the first component has the distinct fitted values and the second indicates the number of repeats. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle pava.blocks<-function(x,w=rep(1,length(x)),b=rep(1,length(x)),up=T){ lasti<-1 if (length(x)==1) return(list(x=x,blocks=b,increasing=up)) for (i in 2:length(x)){ if (x[i]<=x[lasti] & up){ wtotal<-w[lasti]+w[i] x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal w[lasti]<-wtotal b[lasti]<-b[i]+b[lasti] b[i]<-0 }else if (x[i]<=x[lasti] & !up){ lasti<-i }else if (x[i]>x[lasti] & !up){ wtotal<-w[lasti]+w[i] x[lasti]<-(x[lasti]*w[lasti]+x[i]*w[i])/wtotal w[lasti]<-wtotal b[lasti]<-b[i]+b[lasti] b[i]<-0 }else if(x[i]>x[lasti] & up){ lasti<-i } } if (any(b==0)) { subset<-(b>0) return(pava.blocks(x[subset],w[subset],b[subset],up)) } else return(list(x=x,blocks=b,increasing=up)) } -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Mon, 12 Mar 2001, Vadim Ogranovich wrote:> This seems to be a recurrent topic, but I don't remember hearing a > definitive answer. I also apologies for cross-posting. > > Say I have a numerical response variable and a bunch of multi-level factors > I want to use for modeling. I don't expect factor interaction to be > important so there will be no interactions in the model. > All this would be a perfect job for ANOVA except for one additional > requirement: all factors are ordered, i.e. for any two levels in a factor > one is bigger than the other, and I want the regression to preserve the > ordering, that is to be monotonic. >The answers that Thomas gave are probably more theoretically sound than the below, but from a simple computational standpoint here are a couple of ideas: ACE and AVAS both allow monotone constraints, just constrain y to be linear and replace your odered factors with integers (codes) and constrain them to be monotone in the call to ace or avas, you'll need to take the output and do a final regression on it and change the degrees of freedom, but it should be a decent exploratory tool. A second option would be to specify a contrasts matrix that is lower triangular, i.e. for 4 levels: [,1] [,2] [,3] [1,] 0 0 0 [2,] 1 0 0 [3,] 1 1 0 [4,] 1 1 1 then each beta is the measure of the difference in 2 levels. The S-PLUS function nnls.fit will fit the linear regression constraining all parameter estimates to be non-negative which will give you monotone increasing in your ordered factors (though possibly with adjacent levels equal). A contrast matrix like: [,1] [,2] [,3] [1,] 0 0 0 [2,] 1 0 0 [3,] 2 1 0 [4,] 3 2 1 would also work (using nnls.fit again), but this time it says that the increase from one level to the next, must be greater than or equal to the previous increase (positive second derivative). hope this helps, -- Greg Snow, PhD Office: 223A TMCB Department of Statistics Phone: (801) 378-7049 Brigham Young University Dept.: (801) 378-4505 Provo, UT 84602 email: gls at byu.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._