Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 97 20 4 113 12 5 120 12 6 110 17 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQ Hours rank(IQ) rank(hours) diff diff^2 1 106 7 3 2 1 1 2 86 0 1 1 0 0 3 97 20 2 6 -4 16 4 113 12 5 3.5 1.5 2.25 5 120 12 6 3.5 2.5 6.25 6 110 17 4 5 -1 1 26.5 rho= 0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray
Hi, You can try with cor.test(rank(y[1]),rank(y[2])) On 5/29/07, Raymond Wan <rwan at kuicr.kyoto-u.ac.jp> wrote:> > Hi all, > > I am trying to figure out the formula used by R's Spearman rho (using > cor(method="spearman")) because I can't seem to get the same value as by > calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know > where. Basically, I am running these commands: > > > y=read.table(file="tmp",header=TRUE,sep="\t") > > y > IQ Hours > 1 106 7 > 2 86 0 > 3 97 20 > 4 113 12 > 5 120 12 > 6 110 17 > > cor(y[1],y[2],method="spearman") > Hours > IQ 0.2319084 > > [it's an abbreviated example of one I took from Wikipedia]. I > calculated by hand (apologies if the table looks strange when pasted > into e-mail): > > IQ Hours rank(IQ) rank(hours) diff diff^2 > 1 106 7 3 2 1 1 > 2 86 0 1 1 0 0 > 3 97 20 2 6 -4 16 > 4 113 12 5 3.5 1.5 2.25 > 5 120 12 6 3.5 2.5 6.25 > 6 110 17 4 5 -1 1 > 26.5 > > rho= 0.242857 > > where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the > table and realized that the difference in result comes from ties. i.e., > if I remove the tie in rows 4 and 5, I get the same result from both cor > and calculating by hand. Perhaps I'm handling ties wrong...does anyone > know how R does it or perhaps I need to change how I'm using it? > > Thank you! > > Ray > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can think deeply and be quite insane." Nikola Tesla macgrass.com
Dear Ray, The R's Spearman calculated by R is correct for ties or nonties, which is not correct is the probability for the case of ties. I send to you formulates it for the correlation with ties, that is equal to R. Regards, Felipe de Mendiburu Statistician # Spearman correlation "rs" with ties or no ties rs<-function(x,y) { d<-rank(x)-rank(y) tx<-as.numeric(table(x)) ty<-as.numeric(table(y)) Lx<-sum((tx^3-tx)/12) Ly<-sum((ty^3-ty)/12) N<-length(x) SX2<- (N^3-N)/12 - Lx SY2<- (N^3-N)/12 - Ly rs<- (SX2+SY2-sum(d^2))/(2*sqrt(SX2*SY2)) return(rs) } # Aplicacion> cor(y[,1],y[,2],method="spearman")[1] 0.2319084> rs(y[,1],y[,2])[1] 0.2319084 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Raymond Wan Sent: Monday, May 28, 2007 10:29 PM To: r-help at stat.math.ethz.ch Subject: [R] R's Spearman Hi all, I am trying to figure out the formula used by R's Spearman rho (using cor(method="spearman")) because I can't seem to get the same value as by calculating "by hand". Perhaps I'm using "cor" wrong, but I don't know where. Basically, I am running these commands: > y=read.table(file="tmp",header=TRUE,sep="\t") > y IQ Hours 1 106 7 2 86 0 3 97 20 4 113 12 5 120 12 6 110 17 > cor(y[1],y[2],method="spearman") Hours IQ 0.2319084 [it's an abbreviated example of one I took from Wikipedia]. I calculated by hand (apologies if the table looks strange when pasted into e-mail): IQ Hours rank(IQ) rank(hours) diff diff^2 1 106 7 3 2 1 1 2 86 0 1 1 0 0 3 97 20 2 6 -4 16 4 113 12 5 3.5 1.5 2.25 5 120 12 6 3.5 2.5 6.25 6 110 17 4 5 -1 1 26.5 rho= 0.242857 where rho = (1 - ((6 * 26.5) / 6 * (6^2 - 1))). I kept modifying the table and realized that the difference in result comes from ties. i.e., if I remove the tie in rows 4 and 5, I get the same result from both cor and calculating by hand. Perhaps I'm handling ties wrong...does anyone know how R does it or perhaps I need to change how I'm using it? Thank you! Ray ______________________________________________ R-help at stat.math.ethz.ch mailing list stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.