Hi, Could someone please tell me how to perform a Mann-Whitney U test on a dataset with 2 groups where one group has more data values than another? I have split up my 2 groups into 2 columns in my .txt file i'm using with R. Here is the code i have so far... group1 <- c(LeafArea2) group2 <- c(LeafArea1) wilcox.test(group1, group2) This code works for datasets with the same number of data values in each column, but not when there is a different number of data values in one column than another column of data. Is the solution that i have to have a null value in the data column with the fewer data values? I'm testing for significant diferences between the 2 groups, and the result i'm getting in R with the uneven values is different from what i'm getting in SPSS. Help please! Nat ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]]
On Tue, 2007-08-14 at 14:45 -0600, Natalie O'Toole wrote:> Hi, > > Could someone please tell me how to perform a Mann-Whitney U test on a > dataset with 2 groups where one group has more data values than another? > > I have split up my 2 groups into 2 columns in my .txt file i'm using with > R. Here is the code i have so far... > > group1 <- c(LeafArea2) > group2 <- c(LeafArea1) > wilcox.test(group1, group2) > > This code works for datasets with the same number of data values in each > column, but not when there is a different number of data values in one > column than another column of data. > > Is the solution that i have to have a null value in the data column with > the fewer data values? > > I'm testing for significant diferences between the 2 groups, and the > result i'm getting in R with the uneven values is different from what i'm > getting in SPSS. > > Help please! > > NatYou will need to provide any error messages that you are getting. There is a two sample example in ?wilcox.test that shows that the function can handle two vectors with differing sizes. Having the output of str(group1) and str(group2) may also prove useful. You may also wish to pay attention to the "Note" in ?wilcox.test which, if you are getting differing results between SPSS and R, may provide some insight into why, presuming that you can gain the same information about SPSS. HTH, Marc Schwartz
On Tue, 14 Aug 2007, Natalie O'Toole wrote:> Hi, > > Could someone please tell me how to perform a Mann-Whitney U test on a > dataset with 2 groups where one group has more data values than another? > > I have split up my 2 groups into 2 columns in my .txt file i'm using with > R. Here is the code i have so far... > > group1 <- c(LeafArea2) > group2 <- c(LeafArea1) > wilcox.test(group1, group2) > > This code works for datasets with the same number of data values in each > column, but not when there is a different number of data values in one > column than another column of data.There is an example of that scenario on the help page for wilcox.test, so it does 'work'. What exactly went wrong for you?> Is the solution that i have to have a null value in the data column with > the fewer data values? > > I'm testing for significant diferences between the 2 groups, and the > result i'm getting in R with the uneven values is different from what i'm > getting in SPSS.We need a worked example. As the help page says, definitions do differ. If you can provide a reproducible example in R and the output from SPSS we may be able to tell you how to relate that to what you see in R. [...]> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.As it says, we really need such code (and the output you get) to be able to help you. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Natalie, It's best to provide at least a sample of your data. Your field names suggest that your data might be collected in units of mm^2 or some similar measurement of area. Why do you want to use Mann-Whitney, which will rank your data and then use those ranks rather than your actual data? Unless your sample is quite small, why not use a two sample t-test? Also,are your samples paired? If they aren't, did you use the "paired = FALSE" option? JWDougherty
Hi, I do want to use the Mann-Whitney test which ranks my data and then uses those ranks rather than the actual data. Here is the R code i am using: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)> group2<-c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)> result <- wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95,na.action) paired = FALSE so that the Wilcoxon rank sum test which is equivalent to the Mann-Whitney test is used (my samples are NOT paired). conf.level = 0.95 to specify the confidence level na.action is used because i have a NA value (i suspect i am not using na.action in the correct manner) When i use this code i get the following error message: Error in arg == choices : comparison (1) is possible only for atomic and list types When i use this code: group1<- c(1.34,1.47,1.48,1.49,1.62,1.67,1.7,1.7,1.7,1.73,1.81,1.84,1.9,1.96,2,2,2.19,2.29,2.29,2.41,2.41,2.46,2.5,2.6,2.8,2.8,3.07,3.3)> group2<-c(0.98,1.18,1.25,1.33,1.38,1.4,1.49,1.57,1.72,1.75,1.8,1.82,1.86,1.9,1.97,2.04,2.14,2.18,2.49,2.5,2.55,2.57,2.64,2.73,2.77,2.9,2.94,NA)> result <- wilcox.test(group1, group2, paired=FALSE, conf.level = 0.95)I get the following result: Wilcoxon rank sum test with continuity correction data: group1 and group2 W = 405.5, p-value = 0.6494 alternative hypothesis: true location shift is not equal to 0 Warning message: cannot compute exact p-value with ties in: wilcox.test.default(group1, group2, paired = FALSE, conf.level = 0.95) The W value here is 405.5 with a p-value of 0.6494 in SPSS, i am ranking my data and then performing a Mann-Whitney U by selecting analyze - non-parametric tests - 2 independent samples and then checking off the Mann-Whitney U test. For the Mann-Whitney test in SPSS i am gettting the following results: Mann-Whitney U = 350.5 2- tailed p value = 0.643 I think maybe the descrepancy has to do with the specification of the NA values in R, but i'm not sure. If anyone has any suggestions, please let me know! I hope i have provided enough information to convey my problem. Thank-you, Nat __________________ Natalie, It's best to provide at least a sample of your data. Your field names suggest that your data might be collected in units of mm^2 or some similar measurement of area. Why do you want to use Mann-Whitney, which will rank your data and then use those ranks rather than your actual data? Unless your sample is quite small, why not use a two sample t-test? Also,are your samples paired? If they aren't, did you use the "paired = FALSE" option? JWDougherty ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. ------------------------------------------------------------------------------------------------------------------------ This communication is intended for the use of the recipient to which it is addressed, and may contain confidential, personal, and or privileged information. Please contact the sender immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communication received in error, or subsequent reply, should be deleted or destroyed. [[alternative HTML version deleted]]