thr3ads.net - R help - [R] cor() on sets of vectors [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Sam Steingold

2012-Feb-23 22:10 UTC

[R] cor() on sets of vectors

suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
my sets of vectors are arranged as data frames x & y (vector=column):

 x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
 y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))

cor(x,y) returns a _matrix_ of all pairwise correlations:

 cor(x,y)
          d          e            f
a 0.2763696 -0.3523757 -0.373518870
b 0.5892742 -0.1969161 -0.007159589
c 0.3094301  0.1111997 -0.094970748

which is _not_ what I want.

I want diag(cor(x,y)) but without the N^2 calculations.

thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
http://dhimmi.com http://www.PetitionOnline.com/tap12009/ http://jihadwatch.org
Never argue with an idiot: he has more experience with idiotic arguments.

R. Michael Weylandt

2012-Feb-23 22:18 UTC

head link

[R] cor() on sets of vectors

sapply(1:NCOL(x), function(n) cor(x[n], y[n])) is a quick and dirty
way, though probably not optimal.

Michael

On Thu, Feb 23, 2012 at 5:10 PM, Sam Steingold <sds at gnu.org>
wrote:> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
> my sets of vectors are arranged as data frames x & y (vector=column):
>
> ?x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
> ?y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
>
> cor(x,y) returns a _matrix_ of all pairwise correlations:
>
> ?cor(x,y)
> ? ? ? ? ?d ? ? ? ? ?e ? ? ? ? ? ?f
> a 0.2763696 -0.3523757 -0.373518870
> b 0.5892742 -0.1969161 -0.007159589
> c 0.3094301 ?0.1111997 -0.094970748
>
> which is _not_ what I want.
>
> I want diag(cor(x,y)) but without the N^2 calculations.
>
> thanks.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
11.0.11004000
> http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
> http://dhimmi.com http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org
> Never argue with an idiot: he has more experience with idiotic arguments.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2012-Feb-23 22:24 UTC

head link

[R] cor() on sets of vectors

Use 1:n as an index.

e.g.
sapply(1:n, function(i) cor(x[,i],y[,i]))

-- Bert



On Thu, Feb 23, 2012 at 2:10 PM, Sam Steingold <sds at gnu.org>
wrote:> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
> my sets of vectors are arranged as data frames x & y (vector=column):
>
> ?x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
> ?y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
>
> cor(x,y) returns a _matrix_ of all pairwise correlations:
>
> ?cor(x,y)
> ? ? ? ? ?d ? ? ? ? ?e ? ? ? ? ? ?f
> a 0.2763696 -0.3523757 -0.373518870
> b 0.5892742 -0.1969161 -0.007159589
> c 0.3094301 ?0.1111997 -0.094970748
>
> which is _not_ what I want.
>
> I want diag(cor(x,y)) but without the N^2 calculations.
>
> thanks.
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
11.0.11004000
> http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
> http://dhimmi.com http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org
> Never argue with an idiot: he has more experience with idiotic arguments.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

ilai

2012-Feb-23 22:52 UTC

head link

[R] cor() on sets of vectors

On Thu, Feb 23, 2012 at 3:24 PM, Bert Gunter <gunter.berton at gene.com>
wrote:> Use 1:n as an index.
>
> e.g.
> sapply(1:n, function(i) cor(x[,i],y[,i]))
## sapply is a good solution (the only one I could think of too), but
not always worth it:

# for 100 x 1000
 x <- data.frame(matrix(rnorm(100000),nc=1000))
 y <- data.frame(matrix(rnorm(100000),nc=1000))
 system.time(diag(cor(x,y)))
#   user  system elapsed
#  0.592   0.008   0.623
system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
#   user  system elapsed
#  0.384   0.000   0.412

# Great. but for 10 x 1000
x <- data.frame(matrix(rnorm(10000),nc=1000))
y <- data.frame(matrix(rnorm(10000),nc=1000))
system.time(diag(cor(x,y)))
#   user  system elapsed
#  0.256   0.008   0.279
system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
#   user  system elapsed
#  0.376   0.000   0.388

# or 100 x 100
 system.time(diag(cor(x,y)))
#   user  system elapsed
#  0.016   0.000   0.014
 system.time(sapply(1:100,function(i) cor(x[,i],y[,i])))
#   user  system elapsed
#  0.036   0.000   0.036

# Not so great.

Bottom line, as always, it depends.

Cheers
Elai



>
> -- Bert
>
>
>
> On Thu, Feb 23, 2012 at 2:10 PM, Sam Steingold <sds at gnu.org>
wrote:
>> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
>> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
>> my sets of vectors are arranged as data frames x & y
(vector=column):
>>
>> ?x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
>> ?y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
>>
>> cor(x,y) returns a _matrix_ of all pairwise correlations:
>>
>> ?cor(x,y)
>> ? ? ? ? ?d ? ? ? ? ?e ? ? ? ? ? ?f
>> a 0.2763696 -0.3523757 -0.373518870
>> b 0.5892742 -0.1969161 -0.007159589
>> c 0.3094301 ?0.1111997 -0.094970748
>>
>> which is _not_ what I want.
>>
>> I want diag(cor(x,y)) but without the N^2 calculations.
>>
>> thanks.
>>
>> --
>> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
11.0.11004000
>> http://www.childpsy.net/ http://iris.org.il
http://americancensorship.org
>> http://dhimmi.com http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org
>> Never argue with an idiot: he has more experience with idiotic
arguments.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2012-Feb-23 23:29 UTC

head link

[R] cor() on sets of vectors

Elai:

Thank you.You make an excellent point. cor() is implemented at the C
level (via a .internal call) whereas sapply implements an interpreted
loop that has to issue the call each time through the loop (with some
shortcuts/tricks to reduce overhead). So the operations count of the
original poster is completely bogus.

As you say, "it depends..." . For this reason, it is generally a bad
idea to waste much time on code efficiency unless you really need to,
which these days is not often (and there are certainly arenas where
this statement is false). More important is to focus on code clarity,
flexibility, debuggabuility, etc.

Best,
Bert
On Thu, Feb 23, 2012 at 2:52 PM, ilai <keren at math.montana.edu>
wrote:> On Thu, Feb 23, 2012 at 3:24 PM, Bert Gunter <gunter.berton at
gene.com> wrote:
>> Use 1:n as an index.
>>
>> e.g.
>> sapply(1:n, function(i) cor(x[,i],y[,i]))
>
> ## sapply is a good solution (the only one I could think of too), but
> not always worth it:
>
> # for 100 x 1000
> ?x <- data.frame(matrix(rnorm(100000),nc=1000))
> ?y <- data.frame(matrix(rnorm(100000),nc=1000))
> ?system.time(diag(cor(x,y)))
> # ? user ?system elapsed
> # ?0.592 ? 0.008 ? 0.623
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> # ? user ?system elapsed
> # ?0.384 ? 0.000 ? 0.412
>
> # Great. but for 10 x 1000
> x <- data.frame(matrix(rnorm(10000),nc=1000))
> y <- data.frame(matrix(rnorm(10000),nc=1000))
> system.time(diag(cor(x,y)))
> # ? user ?system elapsed
> # ?0.256 ? 0.008 ? 0.279
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> # ? user ?system elapsed
> # ?0.376 ? 0.000 ? 0.388
>
> # or 100 x 100
> ?system.time(diag(cor(x,y)))
> # ? user ?system elapsed
> # ?0.016 ? 0.000 ? 0.014
> ?system.time(sapply(1:100,function(i) cor(x[,i],y[,i])))
> # ? user ?system elapsed
> # ?0.036 ? 0.000 ? 0.036
>
> # Not so great.
>
> Bottom line, as always, it depends.
>
> Cheers
> Elai
>
>
>
>
>>
>> -- Bert
>>
>>
>>
>> On Thu, Feb 23, 2012 at 2:10 PM, Sam Steingold <sds at gnu.org>
wrote:
>>> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
>>> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
>>> my sets of vectors are arranged as data frames x & y
(vector=column):
>>>
>>> ?x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
>>> ?y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
>>>
>>> cor(x,y) returns a _matrix_ of all pairwise correlations:
>>>
>>> ?cor(x,y)
>>> ? ? ? ? ?d ? ? ? ? ?e ? ? ? ? ? ?f
>>> a 0.2763696 -0.3523757 -0.373518870
>>> b 0.5892742 -0.1969161 -0.007159589
>>> c 0.3094301 ?0.1111997 -0.094970748
>>>
>>> which is _not_ what I want.
>>>
>>> I want diag(cor(x,y)) but without the N^2 calculations.
>>>
>>> thanks.
>>>
>>> --
>>> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
11.0.11004000
>>> http://www.childpsy.net/ http://iris.org.il
http://americancensorship.org
>>> http://dhimmi.com http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org
>>> Never argue with an idiot: he has more experience with idiotic
arguments.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

William Dunlap

2012-Feb-24 01:14 UTC

head link

[R] cor() on sets of vectors

> ## sapply is a good solution (the only one I could think of too), but
> not always worth it:
Also look at vapply().  It is like sapply() but you have to tell
it what type and size of output you expect FUN to return.  E.g.,
change
   sapply(1:n, FUN=function(i) cor(x[,i],y[,i]))
to
   vapply(1:n, FUN=function(i) cor(x[,i],y[,i]), FUN.VALUE=0.0)
It can save time and space in formatting the outputs of FUN because
it knows what it will be ahead of time, but its biggest advantage
is that it makes it clear (to the code writer and readers) what it
will return.  It throws an error if FUN does not return what you
say it will.

Also, vapply works nicely for zero-length inputs.  E.g.,
   vapply(list(), range, FUN.VALUE=numeric(2))
returns a 2 by 0 matrix while
   sapply(list(), range)
returns list() and you have to add if(length(x)==0) statements to your
code to make it work in the edge cases, obscuring the basic algorithm.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf Of ilai
> Sent: Thursday, February 23, 2012 2:52 PM
> To: Bert Gunter
> Cc: r-help at r-project.org; sds at gnu.org
> Subject: Re: [R] cor() on sets of vectors
> 
> On Thu, Feb 23, 2012 at 3:24 PM, Bert Gunter <gunter.berton at
gene.com> wrote:
> > Use 1:n as an index.
> >
> > e.g.
> > sapply(1:n, function(i) cor(x[,i],y[,i]))
> 
> ## sapply is a good solution (the only one I could think of too), but
> not always worth it:
> 
> # for 100 x 1000
>  x <- data.frame(matrix(rnorm(100000),nc=1000))
>  y <- data.frame(matrix(rnorm(100000),nc=1000))
>  system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.592   0.008   0.623
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.384   0.000   0.412
> 
> # Great. but for 10 x 1000
> x <- data.frame(matrix(rnorm(10000),nc=1000))
> y <- data.frame(matrix(rnorm(10000),nc=1000))
> system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.256   0.008   0.279
> system.time(sapply(1:1000,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.376   0.000   0.388
> 
> # or 100 x 100
>  system.time(diag(cor(x,y)))
> #   user  system elapsed
> #  0.016   0.000   0.014
>  system.time(sapply(1:100,function(i) cor(x[,i],y[,i])))
> #   user  system elapsed
> #  0.036   0.000   0.036
> 
> # Not so great.
> 
> Bottom line, as always, it depends.
> 
> Cheers
> Elai
> 
> 
> 
> 
> >
> > -- Bert
> >
> >
> >
> > On Thu, Feb 23, 2012 at 2:10 PM, Sam Steingold <sds at gnu.org>
wrote:
> >> suppose I have two sets of vectors: x1,x2,...,xN and y1,y2,...,yN.
> >> I want N correlations: cor(x1,y1), cor(x2,y2), ..., cor(xN,yN).
> >> my sets of vectors are arranged as data frames x & y
(vector=column):
> >>
> >> ?x <- data.frame(a=rnorm(10),b=rnorm(10),c=rnorm(10))
> >> ?y <- data.frame(d=rnorm(10),e=rnorm(10),f=rnorm(10))
> >>
> >> cor(x,y) returns a _matrix_ of all pairwise correlations:
> >>
> >> ?cor(x,y)
> >> ? ? ? ? ?d ? ? ? ? ?e ? ? ? ? ? ?f
> >> a 0.2763696 -0.3523757 -0.373518870
> >> b 0.5892742 -0.1969161 -0.007159589
> >> c 0.3094301 ?0.1111997 -0.094970748
> >>
> >> which is _not_ what I want.
> >>
> >> I want diag(cor(x,y)) but without the N^2 calculations.
> >>
> >> thanks.
> >>
> >> --
> >> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X
11.0.11004000
> >> http://www.childpsy.net/ http://iris.org.il
http://americancensorship.org
> >> http://dhimmi.com http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org
> >> Never argue with an idiot: he has more experience with idiotic
arguments.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> >
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-
> home.htm
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Feb 2012 - cor() on sets of vectors

[R] cor() on sets of vectors

[R] cor() on sets of vectors

[R] cor() on sets of vectors

[R] cor() on sets of vectors

[R] cor() on sets of vectors

[R] cor() on sets of vectors

Reasonably Related Threads