Dear All,
I'm struggling a little with the behaviour of R with GLM interactions. In
particular, I have a dataset with two factors - call them factor A and
factor B, where I would like to fit a GLM that is factor A + (grouped
factor A):factor B.
To try to isolate this, I've ignored the original "factor A" part,
as that
I have this as a separate column in my data. So, it looks like I have
factor A + factor B + factor C:factor B, but I don't want terms for the
base level of factor B for that factor C:factor B interaction.
An example of the data I'm trying to fit a model to could be as follows:
Record FactorA FactorB Weight Response
1 1 1 1 0.73
2 1 2 0.5 0
3 1 3 1 1.00
4 2 1 0.33 2.77
5 2 2 0.4 0
6 2 3 5 0
(I've given a sample here, as my data has around 10,000 records and about
30 columns).
So, I've prepared my data using something similar to:
glmdata <- read.table("C:\\MyData.csv", sep=",",
header=TRUE)
glmdata$FactorA <- C(factor(glmdata$FactorA),base=1)
glmdata$FactorB <- C(factor(glmdata$FactorB),base=2)
glmfit <- glm(Response ~ 1 + FactorA:FactorB, family=(Gamma(
link="log")), weights = Weight, data=glmdata)
After some playing around, I've found I get slightly different results
with FactorA*FactorB, FactorA+FactorB+FactorA:FactorB, FactorA:FactorB -
but whatever I do I always get 6 coefficients.
Really what I would like to do is to ask for FactorA*FactorB less the
entries in the design matrix that I get from FactorA and FactorB. This
would leave me with the design matrix being:
Record Mean FactorA2:FactorB1 FactorA2:FactorB3
1 1 0 0
2 1 0 0
3 1 0 0
4 1 1 0
5 1 0 0
6 1 0 1
If anyone has any advice on how I could make this happen, I'd be very
grateful!
Thanks in advance,
Colin Towers.