Hi, please could you recommend a R package that computes a 2 sample z-test ? thanks, Bogdan [[alternative HTML version deleted]]
Hi Bogdan, Look at ?pnorm Josh On Fri, Jul 15, 2011 at 9:10 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:> Hi, > > please could you recommend a R package that computes a 2 sample z-test ? > > thanks, > > Bogdan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/
Hi, The Z is basically: (mean(x) - mean(y))/sqrt(var(x)/length(x) + var(y)/length(y)) and pnorm will give you a p-value, if you desire it. If the n - 1 divisior used in var() is a problem for you, it is trivial to work around: X <- cbind(x, y) XX <- crossprod(X - tcrossprod(matrix(1, nrow(X))) %*% X * (1/nrow(X))) * 1/nrow(X) diff(colMeans(X))/sqrt(sum(diag(XX)/nrow(X))) where the last line gives the Z and again, pnorm() will give you a p-value if desired. In most cases a t-test is preferred (and is available using the t.test function). HTH, Josh On Fri, Jul 15, 2011 at 9:56 PM, Bogdan Tanasa <tanasa at gmail.com> wrote:> Hi Josh, > > thanks for your email. I have been looking into pnorm, but hmmm ... it does > not seem to assess the difference between 2 populations, it says > that it works on a vector of quantiles, and sd=1, mean = 0. please let me > know if you have any suggestions. thanks, > > bogdan > > On Fri, Jul 15, 2011 at 9:49 PM, Joshua Wiley <jwiley.psych at gmail.com> > wrote: >> >> Hi Bogdan, >> >> Look at ?pnorm >> >> Josh >> >> On Fri, Jul 15, 2011 at 9:10 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: >> > Hi, >> > >> > please could you recommend a R package that computes a 2 sample z-test ? >> > >> > thanks, >> > >> > Bogdan >> > >> > ? ? ? ?[[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> https://joshuawiley.com/ > >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/
On 16/07/11 17:49, Joshua Wiley wrote:> Hi, > > The Z is basically: > > (mean(x) - mean(y))/sqrt(var(x)/length(x) + var(y)/length(y)) > > and pnorm will give you a p-value, if you desire it. > > If the n - 1 divisior used in var() is a problem for you, it is > trivial to work around:<SNIP> (1) Why on earth should the n - 1 divisor be a problem? This divisor yields an unbiased estimator of sigma^2. Generally speaking, unbiasedness is a Good Thing. (2) If a Z-test (rather than a t-test) is being done then this issue simply does not arise. In the context of a Z-test, the variances are ***known*** quantities and so do not get estimated. So there is no divisor to *be* a problem. cheers, Rolf
On Sat, Jul 16, 2011 at 12:05 AM, Rolf Turner <rolf.turner at xtra.co.nz> wrote:> (1) Why on earth should the n - 1 divisor be a problem? ?This divisor > yields an unbiased estimator of sigma^2. ?Generally speaking, > unbiasedness is a Good Thing.Yes, it is an unbiased estimator of sigma^2 for a sample. My response was predicated on the assumption the OP was choosing a Z-test because data on the entire population was available hence no need to use an estimate.> > (2) If a Z-test (rather than a t-test) is being done then this issue simply > does not arise. ? In the context of a Z-test, the variances are ***known*** > quantities and so do not get estimated. ?So there is no divisor to *be* a > problem.But if raw data from the entire population is all that is available, then the "known" variances still need to be calculated though not estimated. var() does not do this directly which is why I offered an alternative. Josh> > ? ?cheers, > > ? ? ? ?Rolf