Marius Hofert
2013-Oct-29 20:16 UTC
[R] (gam) formula: Why different results for terms being factor vs. numeric?
Dear expeRts, If I specify group = as.factor(rep(1:2, each=n)) in the below definition of dat, I get the expected behavior I am looking for. I wonder why I don't get it if group is *not* a factor... My guess was that, internally, factors are treated as natural numbers (and this indeed seems to be true if you convert the latter to factors [essentially meaning changing the levels]), but replacing factors by numeric values (as below) does not provide the same answer. Cheers, Marius require(mgcv) n <- 10 yrs <- 2000+seq_len(n) set.seed(271) dat <- data.frame(year = rep(yrs, 2), group = rep(1:2, each=n), # *not* a factor (as.factor() provides the expected behavior) resp = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n))) fit3 <- gam(resp ~ year + group - 1, data=dat) plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp), xlab="Year", ylab="Response") # fit group A; mean over all responses in this group lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group B; mean over all responses in this group points(yrs, dat$resp[seq_len(n)]) # actual response group A points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B ## => hmmm... because it is not a factor (?), this does not give an expected answer, ## but gam() still correctly figures out that there are two groups
Bert Gunter
2013-Oct-29 20:31 UTC
[R] (gam) formula: Why different results for terms being factor vs. numeric?
Think about it. How can one define a smooth term with a factor??? Further discussion is probably offtopic. Post on stats.stackexchange.com if it still isn't obvious. Cheers, Bert On Tue, Oct 29, 2013 at 1:16 PM, Marius Hofert <marius.hofert at math.ethz.ch> wrote:> Dear expeRts, > > If I specify group = as.factor(rep(1:2, each=n)) in the below > definition of dat, I get the expected behavior I am looking for. I > wonder why I > don't get it if group is *not* a factor... My guess was that, > internally, factors are treated as natural numbers (and this indeed > seems to be true if you convert the latter to factors [essentially > meaning changing the levels]), but replacing factors by numeric values > (as below) does not provide the same answer. > > Cheers, > Marius > > > require(mgcv) > > n <- 10 > yrs <- 2000+seq_len(n) > set.seed(271) > dat <- data.frame(year = rep(yrs, 2), > group = rep(1:2, each=n), # *not* a factor > (as.factor() provides the expected behavior) > resp = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n))) > fit3 <- gam(resp ~ year + group - 1, data=dat) > plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp), > xlab="Year", ylab="Response") # fit group A; mean over all > responses in this group > lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group > B; mean over all responses in this group > points(yrs, dat$resp[seq_len(n)]) # actual response group A > points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B > ## => hmmm... because it is not a factor (?), this does not give an > expected answer, > ## but gam() still correctly figures out that there are two groups > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374