msa at biostat.mgh.harvard.edu
2010-Feb-08 04:45 UTC
[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)
Full_Name: Marek Ancukiewicz Version: 2.10.1 OS: Linux Submission from: (NULL) (74.0.49.2) Both cor() and cor.test() incorrectly handle ordered variables with method="kendall", cor() incorrectly handles ordered variables for method="spearman" (method="person" always works correctly, while method="spearman" works for cor.test, but not for cor()). In erroneous calculations these functions ignore the inherent ordering of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume an alphabetic ordering ('10'<'11'<'9').> cor(9:11,1:3,method="k")[1] 1> cor(as.ordered(9:11),1:3,method="k")[1] -0.3333333> cor.test(as.ordered(9:11),1:3,method="k")Kendall's rank correlation tau data: as.ordered(9:11) and 1:3 T = 1, p-value = 1 alternative hypothesis: true tau is not equal to 0 sample estimates: tau -0.3333333> cor(9:11,1:3,method="s")[1] 1> cor(as.ordered(9:11),1:3,method="s")[1] -0.5> cor.test(as.ordered(9:11),1:3,method="s")Spearman's rank correlation rho data: as.ordered(9:11) and 1:3 S = 0, p-value = 0.3333 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 1
Peter Dalgaard
2010-Feb-08 13:23 UTC
[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)
msa at biostat.mgh.harvard.edu wrote:> Full_Name: Marek Ancukiewicz > Version: 2.10.1 > OS: Linux > Submission from: (NULL) (74.0.49.2) > > > Both cor() and cor.test() incorrectly handle ordered variables with > method="kendall", cor() incorrectly handles ordered variables for > method="spearman" (method="person" always works correctly, while > method="spearman" works for cor.test, but not for cor()). > > In erroneous calculations these functions ignore the inherent ordering > of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume > an alphabetic ordering ('10'<'11'<'9').Strictly speaking, not a bug, since the documentation has x: a numeric vector, matrix or data frame. respectively x, y: numeric vectors of data values. ?x? and ?y? must have the same length. so noone ever claimed that class "ordered" variables should work. However, the root cause is that as.vector on a factor variable (ordered or not) converts it to a character vector, hence> rank(as.vector(as.ordered(9:11)))[1] 3 1 2 Looks like a simple fix would be to use as.vector(x, "numeric") inside the definition of cor().>> cor(9:11,1:3,method="k") > [1] 1 >> cor(as.ordered(9:11),1:3,method="k") > [1] -0.3333333 >> cor.test(as.ordered(9:11),1:3,method="k") > > Kendall's rank correlation tau > > data: as.ordered(9:11) and 1:3 > T = 1, p-value = 1 > alternative hypothesis: true tau is not equal to 0 > sample estimates: > tau > -0.3333333 > >> cor(9:11,1:3,method="s") > [1] 1 >> cor(as.ordered(9:11),1:3,method="s") > [1] -0.5 >> cor.test(as.ordered(9:11),1:3,method="s") > > Spearman's rank correlation rho > > data: as.ordered(9:11) and 1:3 > S = 0, p-value = 0.3333 > alternative hypothesis: true rho is not equal to 0 > sample estimates: > rho > 1 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
ripley at stats.ox.ac.uk
2010-Feb-08 17:11 UTC
[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)
This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --27464147-2083486994-1265648951=:12668 Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed Content-Transfer-Encoding: 8BIT On Mon, 8 Feb 2010, Peter Dalgaard wrote:> msa at biostat.mgh.harvard.edu wrote: >> Full_Name: Marek Ancukiewicz >> Version: 2.10.1 >> OS: Linux >> Submission from: (NULL) (74.0.49.2) >> >> >> Both cor() and cor.test() incorrectly handle ordered variables with >> method="kendall", cor() incorrectly handles ordered variables for >> method="spearman" (method="person" always works correctly, while >> method="spearman" works for cor.test, but not for cor()). >> >> In erroneous calculations these functions ignore the inherent ordering >> of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume >> an alphabetic ordering ('10'<'11'<'9'). > > Strictly speaking, not a bug, since the documentation has > > x: a numeric vector, matrix or data frame. > > respectively > > x, y: numeric vectors of data values. ???x??? and ???y??? must have the > same length. > > so noone ever claimed that class "ordered" variables should work. > > However, the root cause is that as.vector on a factor variable (ordered > or not) converts it to a character vector, hence > >> rank(as.vector(as.ordered(9:11))) > [1] 3 1 2 > > Looks like a simple fix would be to use as.vector(x, "numeric") inside > the definition of cor().A fix for that particular case: the problem is that relies on the underlying representation. I think a better fix would be to do either of - test for numeric and throw an error otherwise, or - use xtfrm, which has the advantage of being more general and allowing methods to be written (S3 or S4 methods in R-devel).> > >>> cor(9:11,1:3,method="k") >> [1] 1 >>> cor(as.ordered(9:11),1:3,method="k") >> [1] -0.3333333 >>> cor.test(as.ordered(9:11),1:3,method="k") >> >> Kendall's rank correlation tau >> >> data: as.ordered(9:11) and 1:3 >> T = 1, p-value = 1 >> alternative hypothesis: true tau is not equal to 0 >> sample estimates: >> tau >> -0.3333333 >> >>> cor(9:11,1:3,method="s") >> [1] 1 >>> cor(as.ordered(9:11),1:3,method="s") >> [1] -0.5 >>> cor.test(as.ordered(9:11),1:3,method="s") >> >> Spearman's rank correlation rho >> >> data: as.ordered(9:11) and 1:3 >> S = 0, p-value = 0.3333 >> alternative hypothesis: true rho is not equal to 0 >> sample estimates: >> rho >> 1 >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 --27464147-2083486994-1265648951=:12668--