Hi guys, I have two data sets of prices: endprice0, endprice1 I use the Wilcox test: wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided", conf.int = T, conf.level = 0.9) The result is with V = 1819, p-value = 0.8812. Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value: diff = c(endprice0 - endprice1) diffNew = diff[diff !=0] diffNew.rank = rank(abs(diffNew)) diffNew.rank.sign <- diffNew.rank * sign(diffNew) ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819 ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751 v = ranks.neg n = 100 z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263 Which p-value should I take for the Wilcox test then? Hix the data sets used in my test are: endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53) endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83) __________________________________________________________________ Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]] com/ca/internetexplorer/ [[alternative HTML version deleted]]
On Apr 5, 2010, at 8:06 AM, hix li wrote:> Hi guys, > > I have two data sets of prices: endprice0, endprice1 > > I use the Wilcox test: > > wilcox.test(endprice0, endprice1, paired = TRUE, alternative = > "two.sided", conf.int = T, conf.level = 0.9) > > The result is with V = 1819, p-value = 0.8812. > > Then I calculated the z-value of the test: z-value = -2.661263. The > corresponding p-value is: p-value = 0.003892, which is different > from the p-value computed in the Wilcox test, I am using the > following steps to compute the z-value:If you are trying to invent a new test then you should provide a theoretic justification. If you are doing this a a homework exercise, then consult with your instructor. If you are looking for alternative methods of looking at the data then either do a paired t.test or try : plot(density(endprice0)) lines(density(endprice1), col="red") -- David.> > diff = c(endprice0 - endprice1) > diffNew = diff[diff !=0] > diffNew.rank = rank(abs(diffNew)) > diffNew.rank.sign <- diffNew.rank * sign(diffNew) > ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819 > ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751 > > v = ranks.neg > n = 100 > z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263 > > > Which p-value should I take for the Wilcox test then? > > Hix > > the data sets used in my test are: > > endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, > 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, > 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, > 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, > 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, > 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, > 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, > 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, > 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, > 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, > 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, > 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, > 13.46, 326.64, 198.19, 114.22, 79.53) > > endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, > 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, > 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, > 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, > 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, > 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, > 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, > 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, > 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, > 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, > 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, > 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, > 191.54, 126.44, 84.83) > > > > > > > __________________________________________________________________ > Make your browsing faster, safer, and easier with the new Internet > Explorer[[elided Yahoo spam]] > com/ca/internetexplorer/ > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
The problem is that your data contains ties, which mess up the nice theory and result in different people using different approximations. I don't know where your z-statistic formula comes from, but you can find the one R uses by looking at the source code in stats:::wilcox.test.default. To see that R's z-statistic approximation is better than yours, try breaking the ties randomly and using exact=TRUE. wilcox.test(endprice0+rnorm(length(endprice0),s=1e-10),endprice1,paired=TRUE,exact=TRUE) You will find that the p-values agree fairly well with R's 0.88. -thomas On Mon, 5 Apr 2010, hix li wrote:> Hi guys, > ? > I have two data sets of prices: endprice0, endprice1 > ? > I use the Wilcox test: > ? > wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided",? conf.int = T, conf.level = 0.9) > ? > The result is with V = 1819, p-value = 0.8812. > ? > Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value: > ? > diff = c(endprice0 - endprice1) > diffNew = diff[diff !=0] > diffNew.rank = rank(abs(diffNew)) > diffNew.rank.sign <-? diffNew.rank? *? sign(diffNew) > ranks.pos <- sum(diffNew.rank.sign[diffNew.rank.sign >0]) = 1819 > ranks.neg <- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751 > ? > v = ranks.neg > n = 100 > z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263 > > > Which p-value should I take for the Wilcox test then? > ? > Hix > ? > the data sets used in my?test?are: > > endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53) > ? > endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83) > > > ? > > > __________________________________________________________________ > Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]] > com/ca/internetexplorer/ > [[alternative HTML version deleted]] > >Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Since this may be homework, I'll confine myself to a hint (which may or may not be the problem; I haven't checked): The formula you use for z is strongly dependent on the value of 'n'. -Peter Ehlers On 2010-04-05 6:06, hix li wrote:> Hi guys, > > I have two data sets of prices: endprice0, endprice1 > > I use the Wilcox test: > > wilcox.test(endprice0, endprice1, paired = TRUE, alternative = "two.sided", conf.int = T, conf.level = 0.9) > > The result is with V = 1819, p-value = 0.8812. > > Then I calculated the z-value of the test: z-value = -2.661263. The corresponding p-value is: p-value = 0.003892, which is different from the p-value computed in the Wilcox test, I am using the following steps to compute the z-value: > > diff = c(endprice0 - endprice1) > diffNew = diff[diff !=0] > diffNew.rank = rank(abs(diffNew)) > diffNew.rank.sign<- diffNew.rank * sign(diffNew) > ranks.pos<- sum(diffNew.rank.sign[diffNew.rank.sign>0]) = 1819 > ranks.neg<- -sum(diffNew.rank.sign[diffNew.rank.sign<0]) = 1751 > > v = ranks.neg > n = 100 > z= (v - n *(n+1)/4)/sqrt(n*(n+1)*(2*n+1)/24) = -2.661263 > > > Which p-value should I take for the Wilcox test then? > > Hix > > the data sets used in my test are: > > endprice0 = c(136.3800, 134.8500, 350.7500, 18.8400, 0.0000, 0.0600, 159.1900, 242.5600, 0.0400, 289.9000, 0.0000, 42.6100, 275.9500, 76.6200, 36.6400, 0.0000, 81.5900, 179.3600, 86.2200, 210.8000, 118.7200, 45.5800, 98.1900, 137.0300, 47.7900, 123.7700, 23.2400, 0.0400, 130.2300, 0.0400, 0.0000, 130.3800, 150.7600, 0.5900, 277.3000, 166.0100, 0.0400, 71.9400, 80.1300, 162.8800, 85.0500, 125.4400, 138.0600, 0.0600, 140.6300, 100.9700, 0.0000, 0.0400, 213.7300, 86.9200, 294.8200, 0.0400, 0.0000, 239.2100, 0.0000, 13.7700, 95.5300, 0.0400, 146.7200, 0.0000, 0.00, 121.57, 68.23, 5.31, 0.04, 96.31, 206.02, 313.39, 92.34, 31.64, 118.71, 499.6, 0, 129.04, 106.88, 183.92, 50.42, 0, 0.04, 0.04, 1.57, 355.56, 81.19, 327.17, 151.18, 0, 0, 125.03, 0, 0.04, 132.01, 0, 0, 11.49, 23, 13.46, 326.64, 198.19, 114.22, 79.53) > > endprice1 = c(138.9300, 131.9700, 300.4700, 0.0000, 0.0000, 0.2200, 159.6300, 277.9100, 0.0000, 328.9700, 0.0000, 40.5100, 270.1000, 52.8000, 39.3800, 0.0400, 79.7100, 110.5600, 41.1600, 224.6600, 123.8800, 53.2700, 96.1500, 67.2800, 40.7300, 99.4900, 20.4900, 0.0400, 126.1000, 0.0000, 1.3700, 140.6500, 165.7200, 0.0000, 314.4200, 207.7400, 0.0400, 76.9300, 75.8000, 184.9100, 83.3700, 139.5300, 157.0500, 0.0000, 147.5900, 105.2800, 0.0000, 0.0000, 207.3000, 74.1100, 288.3900, 0.0400, 0.0000, 213.7200, 0.0400, 14.8300, 53.7000, 0.0400, 150.0800, 0.0000, 0, 123.73, 68.01, 9.52, 0, 111.86, 249.69, 354.18, 98, 31.3, 117.54, 455.32, 1.06, 127.92, 114.51, 173.85, 53.22, 0, 0, 0, 0.31, 376.69, 69.43, 278.8, 147.11, 0.04, 0, 120.05, 0, 0.04, 132.97, 0, 0, 9.98, 28.85, 13.77, 295.17, 191.54, 126.44, 84.83) > > > > > > __________________________________________________________________ > Make your browsing faster, safer, and easier with the new Internet Explorer[[elided Yahoo spam]] > com/ca/internetexplorer/ > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Ehlers University of Calgary