Liaw, Andy
2004-Aug-23 00:16 UTC
[R] A troubled state of freedom: generalized linear models wh ere number of parameters > number of samples
Check out the gpls package on CRAN. HTH, Andy> From: Min-Han Tan > > Good morning, > > Thank you all for your help so far. I really appreciate it. > > The crux of my problem is that I am generating a generalized linear > model with 1 dependent variable, approximately 50 training samples and > 100 parameters (gene levels). > > Essentially, if I have 100 genes and 50 samples, this results in > coefficients for the first 49 samples, and NAs for the rest, with an > ultra low residual deviance (usually approx. 10^-27). This seems to > have something to do with the number of degrees of freedom (since as > the number of genes increases up to 49, the number of residual degrees > of freedom drops to 0) > > What kind of methods can I use to make sense of this? > > I have a subsequent set of samples to work on to validate the results > of this glm, so I am not sure if overfitting is really a problem. > > Background: this is a microarray study, where I have divided the > samples in the training set into 2 groups, and generated a number of > genes to differentiate between both groups. I am going to use the GLM > in a subsequent regression analysis to determine survival. For this > purpose, I need to generate some kind of score for each individual > case using the coefficients of each gene level * gene expression > level. > > I am not a statistician (but a clinician) - many apologies if I am not > conveying myself very clearly here! > > Thanks. > > Min-Han Tan > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >