Ralf B
2010-Jul-29 07:00 UTC
[R] Spearman's Correlation Coefficient to compare distributions?
Hi, I have distributions from two different data sets and I would like to measure how similar their distributions (in terms of their bin frequencies) are. In other words, I am not interested in the exact sequence of data points but rather in the their distributional properties and in their similarities. Spearman's Correlation Coefficient is used to compare data without the assumption of normality. I wonder if this measure can also be used to compare distributional data rather than the data poitns that are summarized in a distribution. Here the example code that exemplifies what I would like to check: aNorm <- rnorm(1000000) bNorm <- rnorm(1000000) cUni <- runif(1000000) ha <- hist(aNorm) hb <- hist(bNorm) hc <- hist(cUni) print(ha$counts) print(hb$counts) print(hc$counts) # relatively similar n <- min(c(NROW(ha$counts),NROW(hb$counts))) cor.test(ha$counts[1:n], hb$counts[1:n], method="spearman") # quite different n <- min(c(NROW(ha$counts),NROW(hc$counts))) cor.test(ha$counts[1:n], hc$counts[1:n], method="spearman") Does this make sense or am I violating some assumptions of the coefficient? Thanks, R.