John Fox
2021-Jan-20 23:10 UTC
[R] Different results on running Wilcoxon Rank Sum test in R and SPSS
Dear Bharat Rawlley, On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:> Dear Professor John, > Thank you very much for your reply! > I agree with you that the non-parametric tests I mentioned in my previous email (Moods median test and Median test) do not make sense in this situation as they treat PFD_n and drug_code as different groups. As you correctly said, I want to use PFD_n as a vector of scores and drug_code to make two groups out of it. This is exactly what the Independent samples median test does in SPSS. I wish to perform the same test in R and am unable to do so. > Simply put, I am asking how to perform the Independent samples median test in R just like it is performed in SPSS?I'm afraid that I'm the wrong person to ask, since I haven't used SPSS in perhaps 30 years and have no idea what it does to test for differences in medians. A Google search for "independent samples median test in R" turns up a number of hits.> > Secondly, for the question you are asking about the test statistic, I have not performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. I have said something to the contrary in my first email, I apologize for that.For continuous data, the Wilcoxon test is, I believe, a reasonable choice, but not when there are so many ties. If SPSS doesn't perform a Wilcoxon test for a difference in medians, then there's of course no reason to expect that the p-values would be the same. Best, John> Thank you very much for your time! > Yours sincerelyBharat Rawlley On Wednesday, 20 January, 2021, 04:47:21 am IST, John Fox <jfox at mcmaster.ca> wrote: > > Dear Bharat Rawlley, > > What you tried to do appears to be nonsense. That is, you're treating > PFD_n and drug_code as if they were scores for two different groups. > > I assume that what you really want to do is to treat PFD_n as a vector > of scores and drug_code as defining two groups. If that's correct, and > with your data into Data, you can try the following: > > ------snip ------ > > > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE) > > ??? Wilcoxon rank sum test with continuity correction > > data:? PFD_n by drug_code > W = 197, p-value = 0.05563 > alternative hypothesis: true location shift is not equal to 0 > 95 percent confidence interval: > ? -2.000014e+00? 5.037654e-05 > sample estimates: > difference in location > ? ? ? ? ? ? ? -1.000019 > > Warning messages: > 1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : > ? cannot compute exact p-value with ties > 2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : > ? cannot compute exact confidence intervals with ties > > ------snip ------ > > You can get an approximate confidence interval by specifying exact=FALSE: > > ------snip ------ > > > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE) > > ??? Wilcoxon rank sum test with continuity correction > > data:? PFD_n by drug_code > W = 197, p-value = 0.05563 > alternative hypothesis: true location shift is not equal to 0 > 95 percent confidence interval: > ? -2.000014e+00? 5.037654e-05 > sample estimates: > difference in location > ? ? ? ? ? ? ? -1.000019 > > ------snip ------ > > As it turns out, your data are highly discrete and have a lot of ties > (see in particular PFD_n = 28): > > ------snip ------ > > > xtabs(~ PFD_n + drug_code, data=Data) > > ? ? ? drug_code > PFD_n? 0? 1 > ? ? 0? 2? 0 > ? ? 16? 1? 1 > ? ? 18? 0? 1 > ? ? 19? 0? 1 > ? ? 20? 2? 0 > ? ? 22? 0? 1 > ? ? 24? 2? 0 > ? ? 25? 1? 2 > ? ? 26? 5? 2 > ? ? 27? 4? 2 > ? ? 28? 5 13 > ? ? 30? 1? 2 > > ------snip ------ > > I'm no expert in nonparametric inference, but I doubt whether the > approximate p-value will be very accurate for data like these. > > I don't know why wilcox.test() (correctly used) and SPSS are giving you > slightly different results -- assuming that you're actually doing the > same thing in both cases. I couldn't help but notice that most of your > data are missing. Are you getting the same value of the test statistic > and different p-values, or is the test statistic different as well? > > I hope this helps, > ? John > > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > > On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote: >> ? Thank you for the reply and suggestion, Michael! >> I used dput() and this is the output I can share with you. Simply explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has been described in my first email. >> Please do let me know if you need any further clarification from my side! Thanks a lot for your time! >> structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,?1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,?0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1,?1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,?0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,?1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,?1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1,?NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA,?0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA,?NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0,?NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4,?28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA,?NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA,?NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA, 0, NA,?NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA,?NA, 28, NA, 26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA,?28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA,?NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26,?NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27, NA,?NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA,?28, NA, NA, NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA,?NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26,?20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA,?NA, NA)), row.names = c(NA, -132L), class = c("tbl_df", "tbl",?"data.frame")) >> >> Yours sincerely?Bharat Rawlley? ? On Tuesday, 19 January, 2021, 03:53:27 pm IST, Michael Dewey <lists at dewey.myzen.co.uk> wrote: >> >> ? Unfortunately your data did not come through. Try using dput() and then >> pasting that into the body of your e-mail message. >> >> On 18/01/2021 17:26, bharat rawlley via R-help wrote: >>> Hello, >>> On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following discrepancies which I am unable to explain. >>> Q1 In the attached data set, I was trying to compare freq4w_n in those with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779. >>> The code I used in R is as follows - >>> wilcox.test(freq4w_n, drug_code, conf.int = T) >>> >>> >>> Q2 Similarly, in the same data set, when trying to compare PFD_n in those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value?< 2.2e-16. >>> The code I used in R is as follows - >>> wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE) >>> >>> >>> I have tried searching on Google and watching some Youtube tutorials, I cannot find an answer, Any help will be really appreciated, Thank you! >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/
bharat rawlley
2021-Jan-21 04:19 UTC
[R] Different results on running Wilcoxon Rank Sum test in R and SPSS
Thank you for your time, Professor John! Much appreciated!? Yours sincerely?Bharat Rawlley? Sent from Yahoo Mail on Android On Thu, 21 Jan 2021 at 4:40 AM, John Fox<jfox at mcmaster.ca> wrote: Dear Bharat Rawlley, On 2021-01-20 1:45 p.m., bharat rawlley via R-help wrote:>? Dear Professor John, > Thank you very much for your reply! > I agree with you that the non-parametric tests I mentioned in my previous email (Moods median test and Median test) do not make sense in this situation as they treat PFD_n and drug_code as different groups. As you correctly said, I want to use PFD_n as a vector of scores and drug_code to make two groups out of it. This is exactly what the Independent samples median test does in SPSS. I wish to perform the same test in R and am unable to do so. > Simply put, I am asking how to perform the Independent samples median test in R just like it is performed in SPSS?I'm afraid that I'm the wrong person to ask, since I haven't used SPSS in perhaps 30 years and have no idea what it does to test for differences in medians. A Google search for "independent samples median test in R" turns up a number of hits.> > Secondly, for the question you are asking about the test statistic, I have not performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. I have said something to the contrary in my first email, I apologize for that.For continuous data, the Wilcoxon test is, I believe, a reasonable choice, but not when there are so many ties. If SPSS doesn't perform a Wilcoxon test for a difference in medians, then there's of course no reason to expect that the p-values would be the same. Best, ? John> Thank you very much for your time! > Yours sincerelyBharat Rawlley? ? On Wednesday, 20 January, 2021, 04:47:21 am IST, John Fox <jfox at mcmaster.ca> wrote: >? >? Dear Bharat Rawlley, > > What you tried to do appears to be nonsense. That is, you're treating > PFD_n and drug_code as if they were scores for two different groups. > > I assume that what you really want to do is to treat PFD_n as a vector > of scores and drug_code as defining two groups. If that's correct, and > with your data into Data, you can try the following: > > ------snip ------ > >? > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE) > >? ??? Wilcoxon rank sum test with continuity correction > > data:? PFD_n by drug_code > W = 197, p-value = 0.05563 > alternative hypothesis: true location shift is not equal to 0 > 95 percent confidence interval: >? ? -2.000014e+00? 5.037654e-05 > sample estimates: > difference in location >? ? ? ? ? ? ? ? -1.000019 > > Warning messages: > 1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : >? ? cannot compute exact p-value with ties > 2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : >? ? cannot compute exact confidence intervals with ties > > ------snip ------ > > You can get an approximate confidence interval by specifying exact=FALSE: > > ------snip ------ > >? > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE) > >? ??? Wilcoxon rank sum test with continuity correction > > data:? PFD_n by drug_code > W = 197, p-value = 0.05563 > alternative hypothesis: true location shift is not equal to 0 > 95 percent confidence interval: >? ? -2.000014e+00? 5.037654e-05 > sample estimates: > difference in location >? ? ? ? ? ? ? ? -1.000019 > > ------snip ------ > > As it turns out, your data are highly discrete and have a lot of ties > (see in particular PFD_n = 28): > > ------snip ------ > >? > xtabs(~ PFD_n + drug_code, data=Data) > >? ? ? ? drug_code > PFD_n? 0? 1 >? ? ? 0? 2? 0 >? ? ? 16? 1? 1 >? ? ? 18? 0? 1 >? ? ? 19? 0? 1 >? ? ? 20? 2? 0 >? ? ? 22? 0? 1 >? ? ? 24? 2? 0 >? ? ? 25? 1? 2 >? ? ? 26? 5? 2 >? ? ? 27? 4? 2 >? ? ? 28? 5 13 >? ? ? 30? 1? 2 > > ------snip ------ > > I'm no expert in nonparametric inference, but I doubt whether the > approximate p-value will be very accurate for data like these. > > I don't know why wilcox.test() (correctly used) and SPSS are giving you > slightly different results -- assuming that you're actually doing the > same thing in both cases. I couldn't help but notice that most of your > data are missing. Are you getting the same value of the test statistic > and different p-values, or is the test statistic different as well? > > I hope this helps, >? ? John > > John Fox, Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > web: https://socialsciences.mcmaster.ca/jfox/ > > On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote: >>? ? Thank you for the reply and suggestion, Michael! >> I used dput() and this is the output I can share with you. Simply explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has been described in my first email. >> Please do let me know if you need any further clarification from my side! Thanks a lot for your time! >> structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,?1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,?0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1,?1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,?0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,?1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,?1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1,?NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA,?0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA,?NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0,?NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4,?28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA,?NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA,?NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA, 0, NA,?NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA,?NA, 28, NA, 26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA,?28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA,?NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26,?NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27, NA,?NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA,?28, NA, NA, NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA,?NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26,?20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA,?NA, NA)), row.names = c(NA, -132L), class = c("tbl_df", "tbl",?"data.frame")) >> >> Yours sincerely?Bharat Rawlley? ? On Tuesday, 19 January, 2021, 03:53:27 pm IST, Michael Dewey <lists at dewey.myzen.co.uk> wrote: >>? ? >>? ? Unfortunately your data did not come through. Try using dput() and then >> pasting that into the body of your e-mail message. >> >> On 18/01/2021 17:26, bharat rawlley via R-help wrote: >>> Hello, >>> On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following discrepancies which I am unable to explain. >>> Q1 In the attached data set, I was trying to compare freq4w_n in those with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779. >>> The code I used in R is as follows - >>> wilcox.test(freq4w_n, drug_code, conf.int = T) >>> >>> >>> Q2 Similarly, in the same data set, when trying to compare PFD_n in those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value?< 2.2e-16. >>> The code I used in R is as follows - >>> wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE) >>> >>> >>> I have tried searching on Google and watching some Youtube tutorials, I cannot find an answer, Any help will be really appreciated, Thank you! >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >? ? > ??? [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/ [[alternative HTML version deleted]]