On Wed, 14 May 2008, Jorge Ivan Velez wrote:
> Dear useRs:
> I'm not sure if it's the correct place to ask but I'll try it
out. I've been
> reading about how to perform Principal Component Analysis (PCA) in
> microarrays (see [1]) and there's something that I don't get it.
Basically
> it's related with performing PCA over data sets which number of
variables is
> greater than the number of samples. For example in the paper mentioned
> above, the number of variables (genes) and samples (tumors) is 8538 and
104,
> respectively. My understanding is that, in PCA, the number of samples (n)
> must be greater than the number of variables (p) and its goal is to seek k
> components, such as k<p and the variance in this new data set be
> maximized. Am I wrong?
Yes, in detail. One of the properties of PCA is to seek projections
(unit-length linear combinations of the variables) of maximal variance,
each being uncorrelated with earlier ones. That is well-defined for n <
p. But you will only get at most n PCs of non-zero variance (and at most
n-1 unless you centre externally), and the rest are pretty arbitrary basis
vectors for the space of constant combinations.
> Could somebody please tell me how is possible to perform PCA when the
> number of variables is greater than the number of samples and how to do
> it in R? I'm really confused. In R I've tried "prcomp"
and "princomp"
> but they didn't work.
See any good book on multivariate analysis, or your statistical
consultant. (See the posting guide as to why this is not the list on
which to ask that question.)
That you can do this does not make it sensible, but it can be
interpretable if there is a strong signal associated with a handful of
genes -- but then so can other methods.
And BTW, prcomp() *does* work, e.g.
X <- matrix(rnorm(20*200), 20)
fit <- prcomp(X)
str(fit)
so the problem is what you did (and you didn't manage to tell us what that
was -- see the footer of the message). ?princomp does tell you to use
prcomp() in this case.
> I'm using Win XP SP2, Intel Core- 2 Duo 2.4 GHz and R 2.7.0 Patched.
>
>
> Thanks in advance,
>
>
> Jorge Ivan Velez
>
>
>
> [1] Ringn?r, M. What is principal components analysis? Nature
Biotechnology
> 26, 303 - 304 (2008),
> http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-303.html
Hmm, that's not a free resource.
>
> [[alternative HTML version deleted]]
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595