thr3ads.net - R help - [R] Pairwise n for large correlation tables? [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Adam D. I. Kramer

2006-Aug-08 02:03 UTC

[R] Pairwise n for large correlation tables?

Hello,

I'm using a very large data set (n > 100,000 for 7 columns), for which
I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,

a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")

...however, I am interested in the number of cases used to compute each
cell of the correlation table. I am unable to find such a function via
google searches, so I wrote one of my own. This turns out to be highly
inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
hints, regarding other functions to use or ways to maket his speedier, would
be much appreciated!

pairwise.n <- function(df=stop("Must provide data frame!")) {
   if (!is.data.frame(df)) {
     df <- as.data.frame(df)
   }
   colNum <- ncol(df)
   result <-
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
     for (j in i:colNum) {
       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
     }
   }
   result
}

--
Adam D. I. Kramer
University of Oregon

Gabor Grothendieck

2006-Aug-08 02:40 UTC

head link

[R] Pairwise n for large correlation tables?

Try this:

# mat is test matrix
mat <- matrix(1:25, 5)
mat[2,2] <- mat[3,4] <- NA
crossprod(!is.na(mat))


On 8/7/06, Adam D. I. Kramer <adik at ilovebacon.org>
wrote:> Hello,
>
> I'm using a very large data set (n > 100,000 for 7 columns), for
which I'm
> pretty happy dealing with pairwise-deleted correlations to populate my
> correlation table. E.g.,
>
> a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
>
> ...however, I am interested in the number of cases used to compute each
> cell of the correlation table. I am unable to find such a function via
> google searches, so I wrote one of my own. This turns out to be highly
> inefficient (e.g., it takes much, MUCH longer than the correlations do).
Any
> hints, regarding other functions to use or ways to maket his speedier,
would
> be much appreciated!
>
> pairwise.n <- function(df=stop("Must provide data frame!")) {
>   if (!is.data.frame(df)) {
>     df <- as.data.frame(df)
>   }
>   colNum <- ncol(df)
>   result <-
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
>   for(i in 1:colNum) {
>     for (j in i:colNum) {
>       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
>     }
>   }
>   result
> }
>
> --
> Adam D. I. Kramer
> University of Oregon
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Christos Hatzis

2006-Aug-08 02:44 UTC

head link

[R] Pairwise n for large correlation tables?

Hi,

You can use complete.cases
It should run faster than the code you suggested.

See following example:

x <- matrix(runif(30),10,3)

# introduce missing values
x[sample(1:10,3),1] <- NA
x[sample(1:10,3),2] <- NA
x[sample(1:10,3),3] <- NA

cor(x,use="pairwise.complete.obs")

n <- ncol(x)
n.na <- matrix(0, n, n)
for (i in seq(1, n)) {
    n.na[i,i] <- sum( complete.cases(x[, i]) )
    for (j in seq(i+1, length=n-i)) {
        ok <- sum( complete.cases(x[, i], x[, j]) )
        n.na[i,j] <- n.na[j,i] <- ok
    }
}
 
HTH

-Christos

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Adam D. I. Kramer
Sent: Monday, August 07, 2006 10:04 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Pairwise n for large correlation tables?

Hello,

I'm using a very large data set (n > 100,000 for 7 columns), for which
I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,

a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")

...however, I am interested in the number of cases used to compute each cell
of the correlation table. I am unable to find such a function via google
searches, so I wrote one of my own. This turns out to be highly inefficient
(e.g., it takes much, MUCH longer than the correlations do). Any hints,
regarding other functions to use or ways to maket his speedier, would be
much appreciated!

pairwise.n <- function(df=stop("Must provide data frame!")) {
   if (!is.data.frame(df)) {
     df <- as.data.frame(df)
   }
   colNum <- ncol(df)
   result <-
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(
df)))
   for(i in 1:colNum) {
     for (j in i:colNum) {
       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
     }
   }
   result
}

--
Adam D. I. Kramer
University of Oregon

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Adam D. I. Kramer

2006-Aug-11 06:02 UTC

head link

[R] Pairwise n for large correlation tables?

On Tue, 8 Aug 2006, ggrothendieck at gmail.com wrote:
> Try this:
>
> # mat is test matrix
> mat <- matrix(1:25, 5)
> mat[2,2] <- mat[3,4] <- NA
> crossprod(!is.na(mat))
Exactly what I was looking for! Thanks.

--Adam
>
>
> On 8/7/06, Adam D. I. Kramer <adik at ilovebacon.org> wrote:
>> Hello,
>>
>> I'm using a very large data set (n > 100,000 for 7 columns), for
which I'm
>> pretty happy dealing with pairwise-deleted correlations to populate my
>> correlation table. E.g.,
>>
>> a <- cor(cbind(col1, col2,
col3),use="pairwise.complete.obs")
>>
>> ...however, I am interested in the number of cases used to compute each
>> cell of the correlation table. I am unable to find such a function via
>> google searches, so I wrote one of my own. This turns out to be highly
>> inefficient (e.g., it takes much, MUCH longer than the correlations
do). Any
>> hints, regarding other functions to use or ways to maket his speedier,
would
>> be much appreciated!
>>
>> pairwise.n <- function(df=stop("Must provide data
frame!")) {
>>   if (!is.data.frame(df)) {
>>     df <- as.data.frame(df)
>>   }
>>   colNum <- ncol(df)
>>   result <-
matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
>>   for(i in 1:colNum) {
>>     for (j in i:colNum) {
>>       result[i,j] <-
length(df[!is.na(df[i])&!is.na(df[j])])/colNum
>>     }
>>   }
>>   result
>> }
>>
>> --
>> Adam D. I. Kramer
>> University of Oregon

Maybe Matching Threads

Search for more seemingly similar threads

R help - Aug 2006 - Pairwise n for large correlation tables?

[R] Pairwise n for large correlation tables?

[R] Pairwise n for large correlation tables?

[R] Pairwise n for large correlation tables?

[R] Pairwise n for large correlation tables?

Maybe Matching Threads