thr3ads.net - search: "0.95439"

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

3

summary( prcomp(*, tol = .) ) -- and 'rank.'

Following from the R-help thread of March 22 on "Memory usage in prcomp", I've started looking into adding an optional 'rank.' argument to prcomp allowing to more efficiently get only a few PCs instead of the full p PCs, say when p = 1000 and you know you only want 5 PCs. (https://stat.ethz.ch/pipermail/r-help/2016-March/437228.html As it was mentioned, we already

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

3

summary( prcomp(*, tol = .) ) -- and 'rank.'

I agree with Kasper, this is a 'big' issue. Does your method of taking only n PCs reduce the load on memory? The new addition to the summary looks like a good idea, but Proportion of Variance as you describe it may be confusing to new users. Am I correct in saying Proportion of variance describes the amount of variance with respect to the number of components the user chooses to show? So

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

2

summary( prcomp(*, tol = .) ) -- and 'rank.'

> On 25 Mar 2016, at 10:41 am, peter dalgaard <pdalgd at gmail.com> wrote: > > As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. > > I suspect it comes about via a mental short-circuit: If we try to control p using a tolerance, then that amounts to saying that the remaining PCs are effectively zero-variance, but

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 24

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

Martin, I fully agree. This becomes an issue when you have big matrices. (Note that there are awesome methods for actually only computing a small number of PCs (unlike your code which uses svn which gets all of them); these are available in various CRAN packages). Best, Kasper On Thu, Mar 24, 2016 at 1:09 PM, Martin Maechler <maechler at stat.math.ethz.ch > wrote: > Following from

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. I suspect it comes about via a mental short-circuit: If we try to control p using a tolerance, then that amounts to saying that the remaining PCs are effectively zero-variance, but that is (usually) not the intention at all. The common case is that the remainder terms have a roughly

summary( prcomp(*, tol = .) ) -- and 'rank.'

2016 Mar 25

0

summary( prcomp(*, tol = .) ) -- and 'rank.'

> On 25 Mar 2016, at 10:08 , Jari Oksanen <jari.oksanen at oulu.fi> wrote: > >> >> On 25 Mar 2016, at 10:41 am, peter dalgaard <pdalgd at gmail.com> wrote: >> >> As I see it, the display showing the first p << n PCs adding up to 100% of the variance is plainly wrong. >> >> I suspect it comes about via a mental short-circuit: If we

Memory usage in prcomp

2016 Mar 22

3

Memory usage in prcomp

Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD

Memory usage in prcomp

2016 Mar 22

3

Memory usage in prcomp

Hi All: I am running prcomp on a very large array, roughly [500000, 3650]. The array itself is 16GB. I am running on a Unix machine and am running ?top? at the same time and am quite surprised to see that the application memory usage is 76GB. I have the ?tol? set very high (.8) so that it should only pull out a few components. I am surprised at this memory usage because prcomp uses the SVD

search for: 0.95439