On Jun 15, 2009, at 5:54 AM, Paul Christoph Schr?der wrote:
> Hi all!
> Maybe someone could help me with the following. I know this hasn't
> directly to do with ecology but I'm also using glm.
>
> I have a list of 16 genes and 10 samples. The samples are of two
> types, 4 Ctrl and 6 Diseased. If,
>
> labelInd<-as.factor(c(rep("0",4),rep("1",6)))
> genes.glm<-glm(labelInd ~ ., family=binomial, data=mat)
>
>
> beeing "mat" the 10x16 matrix (without NAs), I got 17 values,
first
> the intercept, 9 numerical values and "NA" for the last 7 genes.
> Does anybody you know why this is happening or how I can model using
> the 16 genes?
>
> I hope anyone could help me with this!
> Many thanks in advance,
>
> Paul
More than likely, the 7 genes for which you are getting NA's are
collinear to other genes. Thus you get NA's. If you switched the order
of the 7 genes for which you are getting NAs so that they come first
in the formula, you would get NAs for others.
If you use:
summary(genes.glm)
you will likely see a warning message about singularities in the
coefficient table header line. Something like:
Coefficients: (7 not defined because of singularities)
I would use cor(mat) to take a look at the correlation matrix for your
data so that you can review this in more detail.
BTW, with only 10 observations, you are significantly overfitting the
model by using so many covariates. You typically need at least 10 to
20 'events' for each covariate degree of freedom in a logistic
regression model. With only 6 diseased (events) you really don't even
have enough data to support one covariate. The study, presuming an 'a
priori' design, is way underpowered for what you are attempting to do.
HTH,
Marc Schwartz