Patrick Baker
2006-Mar-15 03:28 UTC
[R] comparing AIC values of models with transformed, untransformed, and weighted variables
Hi there, I have a question regarding model comparisons that seems simple enough but to which I cannot find an answer. I am interested in developing a predictive model relating some measure of a tree's stem to the total leaf area (TLA) of the tree. Predictor variables might include, for example, the total cross-sectional area of the tree (commonly referred to as basal area) or the amount of sapwood area (SA) (which represents the amount of wood involved in active transport of water up the tree to the leaves). A variety of people have developed these models for a variety of tree species in a variety of places around the world. Perhaps not surprisingly, different studies have used different model forms in analyzing their data. I am interested in comparing the range of models that have been previously used (some of which are theoretically derived, others of which are empirically driven) using a data set that I have collected (for yet another species in yet another place). To compare the different model forms I had intended to use the AIC. However, I have found, again perhaps not surprisingly, that when I use log-transformed data, the AIC is substantially lower for a given predictor variable. If I use a weighted glm the same issue arises. For example, using BA vs TLA the (rounded) AIC values are 275 for a linear model, 30 for a log-log model, and 8 for a glm weighted by 1/BA. I don't believe that these vast differences reflect a major improvement in the model, but rather the scaling of the variables by transformation or weighting. What I'd like to get some advice or insight on is whether there is an appropriate way to rescale the AIC values to permit comparisons across these models. Any suggestions would be very welcome. Cheers, Patrick Baker
Ben Bolker
2006-Mar-16 02:04 UTC
[R] comparing AIC values of models with transformed, untransformed, and weighted variables
Patrick Baker <patrick.baker <at> sci.monash.edu.au> writes: What I'd like to get some advice or insight on is whether> there is an appropriate way to rescale the AIC values to permit > comparisons across these models. Any suggestions would be very welcome. > Cheers, Patrick Baker > >Not a complete solution, but you could take a look at the likelihoods associated with Box-Cox transformations (e.g. Venables and Ripley MASS pp. 170-172). Ben Bolker
Prof Brian Ripley
2006-Mar-27 15:43 UTC
[R] comparing AIC values of models with transformed, untransformed, and weighted variables
Two comments: 1) The log-likelihood and hence AIC for a model for log X are not comparable with those of a model for X. You need to make an additive adjustment when you transform: it is quite easy to work out what from the definitions. 2) The AIC given by glm() for weighted models was wrong in R < 2.3.0 alpha. I am not sure why you are using a glm for what appears to be a least-squares fit: use lm() instead (or try 2.3.0 alpha). On Wed, 15 Mar 2006, Patrick Baker wrote:> Hi there, I have a question regarding model comparisons that seems simple > enough but to which I cannot find an answer. I am interested in developing a > predictive model relating some measure of a tree's stem to the total leaf > area (TLA) of the tree. Predictor variables might include, for example, the > total cross-sectional area of the tree (commonly referred to as basal area) > or the amount of sapwood area (SA) (which represents the amount of wood > involved in active transport of water up the tree to the leaves). A variety > of people have developed these models for a variety of tree species in a > variety of places around the world. Perhaps not surprisingly, different > studies have used different model forms in analyzing their data. I am > interested in comparing the range of models that have been previously used > (some of which are theoretically derived, others of which are empirically > driven) using a data set that I have collected (for yet another species in > yet another place). To compare the different model forms I had intended to > use the AIC. However, I have found, again perhaps not surprisingly, that when > I use log-transformed data, the AIC is substantially lower for a given > predictor variable. If I use a weighted glm the same issue arises. For > example, using BA vs TLA the (rounded) AIC values are 275 for a linear > model, 30 for a log-log model, and 8 for a glm weighted by 1/BA. I don't > believe that these vast differences reflect a major improvement in the model, > but rather the scaling of the variables by transformation or weighting. What > I'd like to get some advice or insight on is whether there is an appropriate > way to rescale the AIC values to permit comparisons across these models. Any > suggestions would be very welcome. Cheers, Patrick Baker >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595