thr3ads.net - R help - [R] Different results on running Wilcoxon Rank Sum test in R and SPSS [Jan 2021]

If this information is useful, please help other people find it:
Share via:

John Fox

2021-Jan-19 23:17 UTC

[R] Different results on running Wilcoxon Rank Sum test in R and SPSS

Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating 
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector 
of scores and drug_code as defining two groups. If that's correct, and 
with your data into Data, you can try the following:

------snip ------

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

	Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  -2.000014e+00  5.037654e-05
sample estimates:
difference in location
              -1.000019

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,  :
   cannot compute exact confidence intervals with ties

------snip ------

You can get an approximate confidence interval by specifying exact=FALSE:

------snip ------

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

	Wilcoxon rank sum test with continuity correction

data:  PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
  -2.000014e+00  5.037654e-05
sample estimates:
difference in location
              -1.000019

------snip ------

As it turns out, your data are highly discrete and have a lot of ties 
(see in particular PFD_n = 28):

------snip ------

 > xtabs(~ PFD_n + drug_code, data=Data)

      drug_code
PFD_n  0  1
    0   2  0
    16  1  1
    18  0  1
    19  0  1
    20  2  0
    22  0  1
    24  2  0
    25  1  2
    26  5  2
    27  4  2
    28  5 13
    30  1  2

------snip ------

I'm no expert in nonparametric inference, but I doubt whether the 
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you 
slightly different results -- assuming that you're actually doing the 
same thing in both cases. I couldn't help but notice that most of your 
data are missing. Are you getting the same value of the test statistic 
and different p-values, or is the test statistic different as well?

I hope this helps,
  John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help
wrote:>   Thank you for the reply and suggestion, Michael!
> I used dput() and this is the output I can share with you. Simply
explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column
has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has
been described in my first email.
> Please do let me know if you need any further clarification from my side!
Thanks a lot for your time!
> structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,?1, 0, 1, 0,
1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,?0, 1, 0, 0, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1,?1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1,?0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,?1,
1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,?1, 1, 1, 0, 0, 1, 0,
1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1,?NA, NA, 0, NA, 4, NA, 10, NA, 0, 6,
NA, NA, NA, NA, NA, 10, NA,?0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0,
NA, NA, NA,?NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0,?NA,
4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4,?28, NA, NA, 0, 2, 12,
0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA,?NA, NA, NA, NA, NA, 3, NA, NA, NA, NA,
NA, NA, 6, 1, NA, NA,?NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA,
0, NA,?NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA,?NA, 28, NA,
26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA,?28, NA, NA, NA, NA, 28, NA,
28, NA, NA, NA, 28, NA, 28, NA, NA,?NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA,
27, 26, 27, 26, 26,?NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27,
NA,?NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA,?28, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA,?NA, 22, 27, NA, NA, NA, 28,
NA, NA, NA, 28, 28, NA, 28, NA, 26,?20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA,
NA, NA, NA, NA, NA,?NA, NA)), row.names = c(NA, -132L), class =
c("tbl_df", "tbl",?"data.frame"))
> 
> Yours sincerely?Bharat Rawlley    On Tuesday, 19 January, 2021, 03:53:27 pm
IST, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
>   
>   Unfortunately your data did not come through. Try using dput() and then
> pasting that into the body of your e-mail message.
> 
> On 18/01/2021 17:26, bharat rawlley via R-help wrote:
>> Hello,
>> On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the
following discrepancies which I am unable to explain.
>> Q1 In the attached data set, I was trying to compare freq4w_n in those
with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
>> The code I used in R is as follows -
>> wilcox.test(freq4w_n, drug_code, conf.int = T)
>>
>>
>> Q2 Similarly, in the same data set, when trying to compare PFD_n in
those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P
value?< 2.2e-16.
>> The code I used in R is as follows -
>> wilcox.test(PFD_n, drug_code, mu = 0, alternative =
"two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE)
>>
>>
>> I have tried searching on Google and watching some Youtube tutorials, I
cannot find an answer, Any help will be really appreciated, Thank you!
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

bharat rawlley

2021-Jan-20 18:45 UTC

head link

[R] Different results on running Wilcoxon Rank Sum test in R and SPSS

Dear Professor John,?
Thank you very much for your reply!?
I agree with you that the non-parametric tests I mentioned in my previous email
(Moods median test and Median test) do not make sense in this situation as they
treat PFD_n and drug_code as different groups. As you correctly said, I want to
use PFD_n as a vector of scores and drug_code to make two groups out of it. This
is exactly what the Independent samples median test does in SPSS. I wish to
perform the same test in R and am unable to do so.
Simply put, I am asking how to perform the Independent samples median test in R
just like it is performed in SPSS??

Secondly, for the question you are asking about the test statistic, I have not
performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. I
have said something to the contrary in my first email, I apologize for that.?
Thank you very much for your time!?
Yours sincerelyBharat Rawlley    On Wednesday, 20 January, 2021, 04:47:21 am
IST, John Fox <jfox at mcmaster.ca> wrote:

 Dear Bharat Rawlley,

What you tried to do appears to be nonsense. That is, you're treating 
PFD_n and drug_code as if they were scores for two different groups.

I assume that what you really want to do is to treat PFD_n as a vector 
of scores and drug_code as defining two groups. If that's correct, and 
with your data into Data, you can try the following:

------snip ------

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE)

??? Wilcoxon rank sum test with continuity correction

data:? PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
? -2.000014e+00? 5.037654e-05
sample estimates:
difference in location
? ? ? ? ? ? ? -1.000019

Warning messages:
1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? :
? cannot compute exact p-value with ties
2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? :
? cannot compute exact confidence intervals with ties

------snip ------

You can get an approximate confidence interval by specifying exact=FALSE:

------snip ------

 > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE)

??? Wilcoxon rank sum test with continuity correction

data:? PFD_n by drug_code
W = 197, p-value = 0.05563
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
? -2.000014e+00? 5.037654e-05
sample estimates:
difference in location
? ? ? ? ? ? ? -1.000019

------snip ------

As it turns out, your data are highly discrete and have a lot of ties 
(see in particular PFD_n = 28):

------snip ------

 > xtabs(~ PFD_n + drug_code, data=Data)

? ? ? drug_code
PFD_n? 0? 1
? ? 0? 2? 0
? ? 16? 1? 1
? ? 18? 0? 1
? ? 19? 0? 1
? ? 20? 2? 0
? ? 22? 0? 1
? ? 24? 2? 0
? ? 25? 1? 2
? ? 26? 5? 2
? ? 27? 4? 2
? ? 28? 5 13
? ? 30? 1? 2

------snip ------

I'm no expert in nonparametric inference, but I doubt whether the 
approximate p-value will be very accurate for data like these.

I don't know why wilcox.test() (correctly used) and SPSS are giving you 
slightly different results -- assuming that you're actually doing the 
same thing in both cases. I couldn't help but notice that most of your 
data are missing. Are you getting the same value of the test statistic 
and different p-values, or is the test statistic different as well?

I hope this helps,
? John

John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

On 2021-01-19 5:46 a.m., bharat rawlley via R-help
wrote:>? Thank you for the reply and suggestion, Michael!
> I used dput() and this is the output I can share with you. Simply
explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column
has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has
been described in my first email.
> Please do let me know if you need any further clarification from my side!
Thanks a lot for your time!
> structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,?1, 0, 1, 0,
1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,?0, 1, 0, 0, 1, 1, 0, 1, 1, 1,
1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1,?1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
0, 0, 0, 0, 1,?0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,?1,
1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,?1, 1, 1, 0, 0, 1, 0,
1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1,?NA, NA, 0, NA, 4, NA, 10, NA, 0, 6,
NA, NA, NA, NA, NA, 10, NA,?0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0,
NA, NA, NA,?NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0,?NA,
4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4,?28, NA, NA, 0, 2, 12,
0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA,?NA, NA, NA, NA, NA, 3, NA, NA, NA, NA,
NA, NA, 6, 1, NA, NA,?NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA,
0, NA,?NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA,?NA, 28, NA,
26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA,?28, NA, NA, NA, NA, 28, NA,
28, NA, NA, NA, 28, NA, 28, NA, NA,?NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA,
27, 26, 27, 26, 26,?NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27,
NA,?NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA,?28, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA,?NA, 22, 27, NA, NA, NA, 28,
NA, NA, NA, 28, 28, NA, 28, NA, 26,?20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA,
NA, NA, NA, NA, NA,?NA, NA)), row.names = c(NA, -132L), class =
c("tbl_df", "tbl",?"data.frame"))
> 
> Yours sincerely?Bharat Rawlley? ? On Tuesday, 19 January, 2021, 03:53:27 pm
IST, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
>? 
>? Unfortunately your data did not come through. Try using dput() and then
> pasting that into the body of your e-mail message.
> 
> On 18/01/2021 17:26, bharat rawlley via R-help wrote:
>> Hello,
>> On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the
following discrepancies which I am unable to explain.
>> Q1 In the attached data set, I was trying to compare freq4w_n in those
with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779.
>> The code I used in R is as follows -
>> wilcox.test(freq4w_n, drug_code, conf.int = T)
>>
>>
>> Q2 Similarly, in the same data set, when trying to compare PFD_n in
those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P
value?< 2.2e-16.
>> The code I used in R is as follows -
>> wilcox.test(PFD_n, drug_code, mu = 0, alternative =
"two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE)
>>
>>
>> I have tried searching on Google and watching some Youtube tutorials, I
cannot find an answer, Any help will be really appreciated, Thank you!
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>   
	[[alternative HTML version deleted]]

R help - Jan 2021 - Different results on running Wilcoxon Rank Sum test in R and SPSS

[R] Different results on running Wilcoxon Rank Sum test in R and SPSS

[R] Different results on running Wilcoxon Rank Sum test in R and SPSS