thr3ads.net - search: "0.90922"

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

3

summary( prcomp(*, tol = .) ) -- and 'rank.'

Following from the R-help thread of March 22 on "Memory usage in prcomp", I've started looking into adding an optional 'rank.' argument to prcomp allowing to more efficiently get only a few PCs instead of the full p PCs, say when p = 1000 and you know you only want 5 PCs. (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html As it was mentioned, we already

PCA

2010 Mar 10

1

PCA

Hello, I am trying to complete a PCA on a set of standardized ring widths from 8 different sites (T10, T9, T8, T7, T6, T5, T3, and T2). The following is a small portion of my data: T10 T9 T8 T7 T6 T5 T3 T2 1.33738 0.92669 0.91146 0.98922 0.9308 0.88201 0.92287 0.91775 0.82181 1.05319 0.92908 0.97971 0.95165 0.98029 1.14048 0.77803 0.88294 0.96413 0.90893 0.87957 0.9961 0.74926 0.71394 0.70877

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

3

summary( prcomp(*, tol = .) ) -- and 'rank.'

I agree with Kasper, this is a 'big' issue. Does your method of taking only n PCs reduce the load on memory? The new addition to the summary looks like a good idea, but Proportion of Variance as you describe it may be confusing to new users. Am I correct in saying Proportion of variance describes the amount of variance with respect to the number of components the user chooses to show? So

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

2

summary( prcomp(*, tol = .) ) -- and 'rank.'

> On 25 Mar 2016, at 10:41 am, peter dalgaard <pdalgd at gmail.com> wrote: > > As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. > > I suspect it comes about via a mental short-circuit: If we try to control p using a tolerance, then that amounts to saying that the remaining PCs are effectively zero-variance, but

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

Martin, I fully agree. This becomes an issue when you have big matrices. (Note that there are awesome methods for actually only computing a small number of PCs (unlike your code which uses svn which gets all of them); these are available in various CRAN packages). Best, Kasper On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler <maechler at stat.math.ethz.ch > wrote: > Following from

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. I suspect it comes about via a mental short-circuit: If we try to control p using a tolerance, then that amounts to saying that the remaining PCs are effectively zero-variance, but that is (usually) not the intention at all. The common case is that the remainder terms have a roughly

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

> On 25 Mar 2016, at 10:08 , Jari Oksanen <jari.oksanen at oulu.fi> wrote: > >> >> On 25 Mar 2016, at 10:41 am, peter dalgaard <pdalgd at gmail.com> wrote: >> >> As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. >> >> I suspect it comes about via a mental short-circuit: If we

Memory usage in prcomp

2016 Mar 22

3

Memory usage in prcomp

Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD

Memory usage in prcomp

2016 Mar 22

3

Memory usage in prcomp

Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD

search for: 0.90922