Hi all, I'm looking for an analytic method to obtain the PDF & CDF of the sampling distribution of a given correlation (rho) at a given sample size (N). I've attached code describing a monte carlo method of achieving this, and while it is relatively fast, an analytic solution would obviously be optimal. get.cors <- function(i, x, y, N){ end=i*N .Internal(cor(x[(end-N+1):end] ,y[(end-N+1):end] ,TRUE ,FALSE )) } get.r.dist <- function(N, rho, it){ Sigma=matrix(c(1,rho,rho,1),2,2) eS = eigen(Sigma, symmetric = TRUE, EISPACK = TRUE) ev = eS$values fact = eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2) Z = rnorm(2 * N * it) dim(Z) = c(2, N * it) Z = t(fact %*% Z) x = Z[, 1] y = Z[, 2] r = sapply(1:it ,get.cors,x, y, N) return(r) } #Run 1e3 monte carlo iterations, where each obtains the correlation # of 10 pairs of observations from a bivariate normal distribution with # a true correlation of .5. Returns 1e3 values for the observed correlation mc.rs = get.r.dist( N=10 , rho=.5 , it=1e3 ) #plot the PDF & CDF par(mfrow=c(1,2)) hist(mc.rs,prob=T,xlab='Observed correlation') probs = seq(0,1,.01) plot(quantile(mc.rs,probs=probs),probs,type='l',xlab='Observed correlation',ylab='Cumulative probability') -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University www.memetic.ca "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein
See Chapter 22 in N.L. Johnson, S. Kotz, and N. Balakrishnan, "Continuous Univariate Distributions", Volume 2, Second Edition, 1995. Maria On Thu, Jul 17, 2008 at 11:29 AM, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:> Hi all, > > I'm looking for an analytic method to obtain the PDF & CDF of the sampling > distribution of a given correlation (rho) at a given sample size (N). > > I've attached code describing a monte carlo method of achieving this, and > while it is relatively fast, an analytic solution would obviously be > optimal. > > get.cors <- function(i, x, y, N){ > end=i*N > .Internal(cor(x[(end-N+1):end] ,y[(end-N+1):end] ,TRUE ,FALSE )) > } > get.r.dist <- function(N, rho, it){ > Sigma=matrix(c(1,rho,rho,1),2,2) > eS = eigen(Sigma, symmetric = TRUE, EISPACK = TRUE) > ev = eS$values > fact = eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2) > Z = rnorm(2 * N * it) > dim(Z) = c(2, N * it) > Z = t(fact %*% Z) > x = Z[, 1] > y = Z[, 2] > r = sapply(1:it ,get.cors,x, y, N) > return(r) > } > > #Run 1e3 monte carlo iterations, where each obtains the correlation > # of 10 pairs of observations from a bivariate normal distribution with > # a true correlation of .5. Returns 1e3 values for the observed correlation > mc.rs = get.r.dist( N=10 , rho=.5 , it=1e3 ) > > #plot the PDF & CDF > par(mfrow=c(1,2)) > hist(mc.rs,prob=T,xlab='Observed correlation') > probs = seq(0,1,.01) > plot(quantile(mc.rs,probs=probs),probs,type='l',xlab='Observed > correlation',ylab='Cumulative probability') > > -- > Mike Lawrence > Graduate Student, Department of Psychology, Dalhousie University > > www.memetic.ca > > "The road to wisdom? Well, it's plain and simple to express: > Err and err and err again, but less and less and less." > - Piet Hein > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Charles C. Berry
2008-Jul-17 19:57 UTC
[R] Sampling distribution (PDF & CDF) of correlation
On Thu, 17 Jul 2008, Mike Lawrence wrote:> Hi all, > > I'm looking for an analytic method to obtain the PDF & CDF of the sampling > distribution of a given correlation (rho) at a given sample size (N).See Fisher (1915) Biometrika. It is non-central t up to a monotone transformation. Finding the value of the non-centrality parameter and that transformation is left as an exercise for the reader. Chuck> > I've attached code describing a monte carlo method of achieving this, and > while it is relatively fast, an analytic solution would obviously be optimal. > > get.cors <- function(i, x, y, N){ > end=i*N > .Internal(cor(x[(end-N+1):end] ,y[(end-N+1):end] ,TRUE ,FALSE )) > } > get.r.dist <- function(N, rho, it){ > Sigma=matrix(c(1,rho,rho,1),2,2) > eS = eigen(Sigma, symmetric = TRUE, EISPACK = TRUE) > ev = eS$values > fact = eS$vectors %*% diag(sqrt(pmax(ev, 0)), 2) > Z = rnorm(2 * N * it) > dim(Z) = c(2, N * it) > Z = t(fact %*% Z) > x = Z[, 1] > y = Z[, 2] > r = sapply(1:it ,get.cors,x, y, N) > return(r) > } > > # Run 1e3 monte carlo iterations, where each obtains the correlation > # of 10 pairs of observations from a bivariate normal distribution with > # a true correlation of .5. Returns 1e3 values for the observed correlation > mc.rs = get.r.dist( N=10 , rho=.5 , it=1e3 ) > > #plot the PDF & CDF > par(mfrow=c(1,2)) > hist(mc.rs,prob=T,xlab='Observed correlation') > probs = seq(0,1,.01) > plot(quantile(mc.rs,probs=probs),probs,type='l',xlab='Observed > correlation',ylab='Cumulative probability') > > -- > Mike Lawrence > Graduate Student, Department of Psychology, Dalhousie University > > www.memetic.ca > > "The road to wisdom? Well, it's plain and simple to express: > Err and err and err again, but less and less and less." > - Piet Hein > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901