maxbre
2012-May-29 15:55 UTC
[R] Wilcoxon-Mann-Whitney U value: outcomes from different stat packages
Given this example #start code a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, 3220,490,20790,290,740,5350,940,3910,0,640,850,260) wilcox.test(a, b, paired=FALSE) #sum of rank for first sample sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2 W1 U1 <- length(a)*length(b)/2-W1 U1 #sum of ranks for second sample sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2 W2 U2 <- length(a)*length(b)/2-W2 U2 #end code And given the fact that: - in the note of R Wilcox.test is clearly stated: ? The literature is not unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney tests. The two most common definitions correspond to the sum of the ranks of the first sample with the minimum value subtracted or not. R subtracts [?.], giving a value which is larger by m(m+1)/2 for a first sample of size m? - as result of the same test performed with different stat packages (i.e. STATISTICA and PAST) I?ve got an U value of 200.5 as in W2 (see my script) with the same p-value What can I conclude regarding STATISTICA and PAST packages?... are they giving W2 (see my script) instead of U? A crucial point is that the variant of the algorithm used for computation by the packages is very rarely indicated in the output or documented in the help facility and the manuals. See also this link (I?ve found after a long meandering on the web) about the comparison of ?wilcoxon mann whitney? u test outcomes from different stat packages: http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 Any of you have faced the same type of issues? Or am I completely wrong? maxbre -- View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html Sent from the R help mailing list archive at Nabble.com.
peter dalgaard
2012-May-30 07:33 UTC
[R] Wilcoxon-Mann-Whitney U value: outcomes from different stat packages
On May 29, 2012, at 17:55 , maxbre wrote:> Given this example > > #start code > > a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, > > 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) > > b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, > 3220,490,20790,290,740,5350,940,3910,0,640,850,260) > > wilcox.test(a, b, paired=FALSE) > > #sum of rank for first sample > sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a > W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2 > W1 > > U1 <- length(a)*length(b)/2-W1 > U1 > > #sum of ranks for second sample > sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b > W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2 > W2 > > U2 <- length(a)*length(b)/2-W2 > U2 > > #end code > > And given the fact that: > > - in the note of R Wilcox.test is clearly stated: ? The literature is not > unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney > tests. The two most common definitions correspond to the sum of the ranks of > the first sample with the minimum value subtracted or not. R subtracts [?.], > giving a value which is larger by m(m+1)/2 for a first sample of size m?NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.> > - as result of the same test performed with different stat packages (i.e. > STATISTICA and PAST) I?ve got an U value of 200.5 as in W2 (see my script) > with the same p-value > > What can I conclude regarding STATISTICA and PAST packages?... are they > giving W2 (see my script) instead of U?Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.> > A crucial point is that the variant of the algorithm used for computation by > the packages is very rarely indicated in the output or documented in the > help facility and the manuals. > See also this link (I?ve found after a long meandering on the web) about the > comparison of ?wilcoxon mann whitney? u test outcomes from different stat > packages: > http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 > > Any of you have faced the same type of issues? Or am I completely wrong? > > maxbre > > -- > View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
peter dalgaard
2012-May-30 07:33 UTC
[R] Wilcoxon-Mann-Whitney U value: outcomes from different stat packages
On May 29, 2012, at 17:55 , maxbre wrote:> Given this example > > #start code > > a<-c(0,70,50,100,70,650,1300,6900,1780,4930,1120,700,190,940, > > 760,100,300,36270,5610,249680,1760,4040,164890,17230,75140,1870,22380,5890,2430) > > b<-c(0,0,10,30,50,440,1000,140,70,90,60,60,20,90,180,30,90, > 3220,490,20790,290,740,5350,940,3910,0,640,850,260) > > wilcox.test(a, b, paired=FALSE) > > #sum of rank for first sample > sum.rank.a <- sum(rank(c(a,b))[1:29]) #sum of ranks assigned to the group a > W1<- sum.rank.a - (length(a)*(length(a)+1)) / 2 > W1 > > U1 <- length(a)*length(b)/2-W1 > U1 > > #sum of ranks for second sample > sum.rank.b <-sum(rank(c(a,b))[30:58]) #sum of ranks assigned to the group b > W2 <- sum.rank.b - (length(b)*(length(b)+1)) / 2 > W2 > > U2 <- length(a)*length(b)/2-W2 > U2 > > #end code > > And given the fact that: > > - in the note of R Wilcox.test is clearly stated: ? The literature is not > unanimous about the definitions of the Wilcoxon rank sum and Mann-Whitney > tests. The two most common definitions correspond to the sum of the ranks of > the first sample with the minimum value subtracted or not. R subtracts [?.], > giving a value which is larger by m(m+1)/2 for a first sample of size m?NB: You are quoting like the Devil reads the Bible: The bit in [...] is "and S-PLUS does not". So R's value is _smaller_ by m(m+1)/2.> > - as result of the same test performed with different stat packages (i.e. > STATISTICA and PAST) I?ve got an U value of 200.5 as in W2 (see my script) > with the same p-value > > What can I conclude regarding STATISTICA and PAST packages?... are they > giving W2 (see my script) instead of U?Most likely. Or, equivalently, they are basing U on the 2nd group instead of the first. This varies between software, as does conventions for which way you subtract in a two sample t test. Some textbooks say that you use the _smallest_ group, and tabulate critical regions only for those cases, to save paper.> > A crucial point is that the variant of the algorithm used for computation by > the packages is very rarely indicated in the output or documented in the > help facility and the manuals. > See also this link (I?ve found after a long meandering on the web) about the > comparison of ?wilcoxon mann whitney? u test outcomes from different stat > packages: > http://www.jstor.org/discover/10.2307/2685616?uid=3738296&uid=2129&uid=2&uid=70&uid=4&sid=47699045750617 > > Any of you have faced the same type of issues? Or am I completely wrong? > > maxbre > > -- > View this message in context: http://r.789695.n4.nabble.com/Wilcoxon-Mann-Whitney-U-value-outcomes-from-different-stat-packages-tp4631703.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com