Huidong TIAN
2011-Nov-10 14:02 UTC
[R] Sum of the deviance explained by each term in a gam model does not equal to the deviance explained by the full model.
Dear R users, I read your methods of extracting the variance explained by each predictor in different places. My question is: using the method you suggested, the sum of the deviance explained by all terms is not equal to the deviance explained by the full model. Could you tell me what caused such problem?> set.seed(0) > n<-400 > x1 <- runif(n, 0, 1) > ## to see problem with not fixing smoothing parameters > ## remove the `##' from the next line, and the `sp' > ## arguments from the `gam' calls generating b1 and b2. > x2 <- runif(n, 0, 1) ## *.1 + x1 > f1 <- function(x) exp(2 * x) > f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 > f <- f1(x1) + f2(x2) > e <- rnorm(n, 0, 2) > y <- f + e > ## fit full and reduced models... > b <- gam(y~s(x1)+s(x2)) > b1 <- gam(y~s(x1),sp=b$sp[1]) > b2 <- gam(y~s(x2),sp=b$sp[2]) > b0 <- gam(y~1) > ## calculate proportions deviance explained... > dev.1 <- (deviance(b1)-deviance(b))/deviance(b0) ## prop explained bys(x2)> dev.2 <- (deviance(b2)-deviance(b))/deviance(b0) ## prop explained bys(x1)> > dev.1 + dev.2[1] 0.6974949> summary(b)$dev.expl[1] 0.7298136 I checked the two models (b1 & b2), found the model coefficients are different with model b, so I feel it could be the problem. wish to hear your comments. Huidong Tian [[alternative HTML version deleted]]
Simon Wood
2011-Nov-11 14:10 UTC
[R] Sum of the deviance explained by each term in a gam model does not equal to the deviance explained by the full model.
It's the same problem as with any regression model where the predictors are not strictly orthogonal. You can make the explained deviances per term sum to the whole model explained deviance by dropping terms sequentially, but then the explained deviance per term depends on the order in which you drop terms. The alternative in the code you pasted in below is at least independent of the ordering. I suppose you could make things add up in the way you want by symmetrising the computation like this... ## reduce model one way.... dev.1<- (deviance(b1)-deviance(b))/deviance(b0) ## prop expl by s(x2) dev.2<- (deviance(b0)-deviance(b1))/deviance(b0) ## prop expl by s(x1) ## reduce it the other way... dev.2[2]<- (deviance(b2)-deviance(b))/deviance(b0) ## prop expl by s(x2) dev.1[2]<- (deviance(b0)-deviance(b2))/deviance(b0) # prop expl by s(x1) ## average the alternatives dev.1 <- mean(dev.1) dev.2 <- mean(dev.2) ## so now explained deviance adds up... dev.1+dev.2 summary(b)$dev.expl ... but I'm not convinced that this really adds anything useful, and it gets very tedious as the number of model terms increases... -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 On 10/11/11 14:02, Huidong TIAN wrote:> Dear R users, > I read your methods of extracting the variance explained by each > predictor in different places. My question is: using the method you > suggested, the sum of the deviance explained by all terms is not equal to > the deviance explained by the full model. Could you tell me what caused > such problem? > >> set.seed(0) >> n<-400 >> x1<- runif(n, 0, 1) >> ## to see problem with not fixing smoothing parameters >> ## remove the `##' from the next line, and the `sp' >> ## arguments from the `gam' calls generating b1 and b2. >> x2<- runif(n, 0, 1) ## *.1 + x1 >> f1<- function(x) exp(2 * x) >> f2<- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 >> f<- f1(x1) + f2(x2) >> e<- rnorm(n, 0, 2) >> y<- f + e >> ## fit full and reduced models... >> b<- gam(y~s(x1)+s(x2)) >> b1<- gam(y~s(x1),sp=b$sp[1]) >> b2<- gam(y~s(x2),sp=b$sp[2]) >> b0<- gam(y~1) >> ## calculate proportions deviance explained... >> dev.1<- (deviance(b1)-deviance(b))/deviance(b0) ## prop explained by > s(x2) >> dev.2<- (deviance(b2)-deviance(b))/deviance(b0) ## prop explained by > s(x1) >> >> dev.1 + dev.2 > [1] 0.6974949 >> summary(b)$dev.expl > [1] 0.7298136 > > I checked the two models (b1& b2), found the model coefficients are > different with model b, so I feel it could be the problem. > > wish to hear your comments. > > Huidong Tian > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >