Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 97 20 4 113 12 5 120 12 6 110 17 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQ Hours rank(IQ) rank(hours) diff diff^2 1 106 7 3 2 1 1 2 86 0 1 1 0 0 3 97 20 2 6 -4 16 4 113 12 5 3.5 1.5 2.25 5 120 12 6 3.5 2.5 6.25 6 110 17 4 5 -1 1 26.5 rho= 0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray
Hi, You can try with cor.test(rank(y[1]),rank(y[2])) On 5/29/07, Raymond Wan <rwan at kuicr.kyoto-u.ac.jp> wrote:> > Hi all, > > I am trying to figure out the formula used by R's Spearman rho (using > cor(method="spearman")) because I can't seem to get the same value as by > calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know > where. Basically, I am running these commands: > > > y=read.table(file="tmp",header=TRUE,sep="\t") > > y > IQ Hours > 1 106 7 > 2 86 0 > 3 97 20 > 4 113 12 > 5 120 12 > 6 110 17 > > cor(y[1],y[2],method="spearman") > Hours > IQ 0.2319084 > > [it's an abbreviated example of one I took from Wikipedia]. I > calculated by hand (apologies if the table looks strange when pasted > into e-mail): > > IQ Hours rank(IQ) rank(hours) diff diff^2 > 1 106 7 3 2 1 1 > 2 86 0 1 1 0 0 > 3 97 20 2 6 -4 16 > 4 113 12 5 3.5 1.5 2.25 > 5 120 12 6 3.5 2.5 6.25 > 6 110 17 4 5 -1 1 > 26.5 > > rho= 0.242857 > > where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the > table and realized that the difference in result comes from ties. i.e., > if I remove the tie in rows 4 and 5, I get the same result from both cor > and calculating by hand. Perhaps I'm handling ties wrong...does anyone > know how R does it or perhaps I need to change how I'm using it? > > Thank you! > > Ray > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can think deeply and be quite insane." Nikola Tesla http://www.macgrass.com
Dear Ray, The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R. Regards, Felipe de Mendiburu Statistician # Spearman correlation "rs" with ties or no ties rs<-function(x,y) { d<-rank(x)-rank(y) tx<-as.numeric(table(x)) ty<-as.numeric(table(y)) Lx<-sum((tx^3-tx)/12) Ly<-sum((ty^3-ty)/12) N<-length(x) SX2<- (N^3-N)/12 - Lx SY2<- (N^3-N)/12 - Ly rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2)) return(rs) } # Aplicacion> cor(y[,1],y[,2],method="spearman")[1] 0.2319084> rs(y[,1],y[,2])[1] 0.2319084 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Raymond Wan Sent: Monday, May 28, 2007 10:29 PM To: r-help at stat.math.ethz.ch Subject: [R] R's Spearman Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 97 20 4 113 12 5 120 12 6 110 17 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQ Hours rank(IQ) rank(hours) diff diff^2 1 106 7 3 2 1 1 2 86 0 1 1 0 0 3 97 20 2 6 -4 16 4 113 12 5 3.5 1.5 2.25 5 120 12 6 3.5 2.5 6.25 6 110 17 4 5 -1 1 26.5 rho= 0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.