Weiwei Shi
2006-May-30 15:57 UTC
[R] when dimensionality is larger than the number of observations?
Hi, there: Can anyone here kindly point some good reference or links on this topic? Esp. some solutions from BioConductor or R, when dealing with microarray-like, "fat" data? thanks, -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III [[alternative HTML version deleted]]
Gabor Grothendieck
2006-May-30 16:48 UTC
[R] when dimensionality is larger than the number of observations?
On 5/30/06, Weiwei Shi <helprhelp at gmail.com> wrote:> Hi, there: > > Can anyone here kindly point some good reference or links on this topic? > Esp. some solutions from BioConductor or R, when dealing with > microarray-like, "fat" data?In that case there will be an entire subspace of coefficient vectors that will give the same fitted values. Lets take 3 rows of the iris data set and regress column 1 on the rest. There will be an entire subspace of coefficients that correspond to the same (unique) fitted values and we can get one of those coefficient vectors using the generalized inverse:> # test data > iris3 <- iris[c(1, 51, 101),] > y <- iris3[,1] > y[1] 5.1 7.0 6.3> X <- model.matrix(~., iris3[,2:5]) > X(Intercept) Sepal.Width Petal.Length Petal.Width Speciesversicolor Speciesvirginica 1 1 3.5 1.4 0.2 0 0 51 1 3.2 4.7 1.4 1 0 101 1 3.3 6.0 2.5 0 1 attr(,"assign") [1] 0 1 2 3 4 4 attr(,"contrasts") attr(,"contrasts")$Species [1] "contr.treatment"> > library(MASS) # needed for ginv > coefs <- c(ginv(crossprod(X)) %*% crossprod(X, y)) > names(coefs) <- colnames(X) > coefs(Intercept) Sepal.Width Petal.Length Petal.Width Speciesversicolor Speciesvirginica 0.3619361 1.1497417 0.5443438 -0.2405670 0.7372685 -0.5207289> X %*% coefs # fitted values[,1] 1 5.1 51 7.0 101 6.3