On 26/02/14 01:40, Lorenzo Isella wrote:> Dear All,
> Please consider the snippet at the end of the email.
> It is representative of the problems I am experiencing.
> I am trying to use glm (without using the formula interface because the
> original data is quite large) to model the response in a case where the
> predictors are a mix of numbers and factors.
> In the end, I always end up with an error message, despite having tried
> different choices for the "family" parameter.
> Maybe I am missing the obvious, but can anyone run glm with a
> combination of numbers and factors?
> Any help is appreciated.
> Cheers
>
> Lorenzo
>
>
>
>
> ###############################################################
> set.seed(1234)
>
> x <- rnorm(1000)
> dim(x) <- c(100,10)
> x <- as.data.frame(x)
> names(x) <- LETTERS[seq(10)]
>
> x$J <- round(x$J)
>
> x$J <- as.factor(x$J)
>
> y <- x$A
> x <- subset(x, select=-c(A))
>
> model <- glm.fit(x,y## , family=gaussian)
From the help for glm.fit:
>> For glm.fit: x is a ***design*** matrix of dimension n * p, and y is
>> a vector of observations of length n.
(Emphasis mine.)
So if you want to/insist on using glm.fit() rather than glm() you will
have construct your own design matrix. I.e. replace
each factor column by k-1 columns of dummy variables (where k is the
number of levels of the given factor). Note that "x" should really be
a
*matrix*, not a data frame although it seems that data frames (all of
whose columns are numeric) get coerced to matrices so it doesn't matter
much.
cheers,
Rolf Turner