On 31-Jul-09 13:38:10, tedzzx wrote:> Dear R users,
> I have got two samples:
> sample A with observation of 223:
> sample A has five categories: 1,2,3,4,5 (I use the numer
> 1,2,3,4,5 to define the five differen categories)
> there are 5 observations in category 1; 81 observations in
> category 2;110 observations in category 3; 27 observations
> in category 4; 0 observations in category 5;
> To present the sample in R: a<-rep(1:5, c(5,81,110,27,0))
>
> sample B with observation of 504:
> sample B also has the same five categories: 1,2,3,4,5
> there are 6 observations in category 1; 127 observations in
> category 2;297 observations in category 3; 72 observations
> in category 4; 2 observations in category 5;
> To present the sample in R: b<-rep(1:5, c(6,127,297,72,2))
>
> I want to test weather these two samples have significant difference
> in distribution ( or Tests for Two Independent Samples).
>
> I find a webside in:
> http://faculty.chass.ncsu.edu/garson/PA765/mann.htm
>
> This page shows four nonparametric tests. Bust I can only find the test
> Kolmogorov-Smirnov Z Test.
> res<-ks.test(a,b)
>
> Can any one tell me which package has the other 3 tests? or Is there
> any other test for my question?
> Thanks advance
> Ted
If your "1,2,3,4,5" are simply nominal codes for the categories,
then you may be satisfied with a Fisher test or simply a chi-squared
test (using simulated P-values in view of the low frequencies in
some cells).
Taking your data:
A<-c(5,81,110,27,0)
B<-c(6,127,297,72,2)
M<-cbind(A,B)
D<-colSums(M)
P<-M%*%(diag(1/D))
P
# [,1] [,2]
# [1,] 0.02242152 0.011904762
# [2,] 0.36322870 0.251984127 ## So the main differences between
# [3,] 0.49327354 0.589285714 ## A and B are in these two categories
# [4,] 0.12107623 0.142857143
# [5,] 0.00000000 0.003968254
fisher.test(M,simulate.p.value = TRUE,B=100000)
# Fisher's Exact Test for Count Data with simulated p-value
# (based on 1e+05 replicates)
# data: M
# p-value = 0.01594
chisq.test(M,simulate.p.value=TRUE,B=100000)
# Pearson's Chi-squared test with simulated p-value
# (based on 1e+05 replicates)
# data: M
# X-squared = 11.7862, df = NA, p-value = 0.01501
So the P-values are similar in both tests.
(Another) Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 31-Jul-09 Time: 17:53:58
------------------------------ XFMail ------------------------------