I encounters some codes in ggplot2 manual and confused with one of its lm syntax. The code is here: library(ggplot2) d <- subset(diamonds, carat < 2.5 & rbinom(nrow(diamonds), 1, 0.2) == 1) d$lcarat <- log10(d$carat) d$lprice <- log10(d$price) detrend <- lm(lprice ~ lcarat, data = d) d$lprice2 <- resid(detrend) mod <- lm(lprice2 ~ lcarat * color, data = d) # *** what puzzled me is the last statement marked with ***. How does R deal with lcarat * color, since color is not of numeric value. If this is ok, how can I write the mathematical formula of this regression model ? [[alternative HTML version deleted]]
?? <rz1991 <at> foxmail.com> writes:> I encounters some codes in ggplot2 manual and confused with one of > its lm syntax.[snip] mod <- lm(lprice2 ~ lcarat * color, data = d)> # *** what puzzled me is the > last statement marked with ***. How does R deal with lcarat * color, > since color is not of numeric value. If this is ok, how can I write > the mathematical formula of this regression model ?See chapter 11 of the introduction to R. This formula specification fits what might be described as an ANCOVA model, where the jth individual in the ith level of "color" has expected mean value mu_{ij} = a_i + b_i*lcarat_{ij} That is, different slopes and intercepts for each level of 'color'. That isn't exactly the way R parameterizes the model, but it should get you started. You can look around for search results on "ANCOVA in R" to learn more.