Dear R-users, I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. ------------------------------------------------------------------------------- SelectedVals=c() for(i in seq(0,1,by=0.001)) { if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)<=0.05) SelectedVals=c(SelectedVals,i) } ------------------------------------------------------------------------------- I have marked the border values given by this script to the histogram of the original random distribution: http://www.ag.fimug.fi/~Atte/62Hist100410.pdf Atte Tenkanen University of Turku, Finland Department of Musicology +35823335278 http://users.utu.fi/attenka/
On Apr 16, 2010, at 12:11 PM, Atte Tenkanen wrote:> Dear R-users, > > I want to check if certain values are from random distribution, that > includes values between 0-1. So, it is not really normal even though > shapiro.test says it is highly normal... Can I do something like > this and think that the values given are right. z.test is from > package TeachingDemos. > ------------------------------------------------------------------------------- > SelectedVals=c() > for(i in seq(0,1,by=0.001)) > { > if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution)) > $p.value)<=0.05) SelectedVals=c(SelectedVals,i) > } >You are attempting to do statistics on a single number at a time. If you do not immediately appreciate the absurdity of this effort, then you should consult a real statistician without delay. There are many fine statisticians at your university. -- David Winsemius, MD West Hartford, CT
Several points: 1. The Shapiro test does not tell you that something is normal or highly normal, only that you don't have enough evidence to disprove that the data came from a normal population (powered for a certain type of deviation from normality). 2. The z.test function is intended to be used as a stepping stone in learning for students, a simple test with unrealistic assumptions to get the ideas, then relax the assumptions and learn about t tests and others. 3. The z test is only used when the population standard deviation is known, you calculate the sd from the data, that is what t tests are for. 4. Calculating the hypothesized mean from the data is backwards. 5. using a sample size of 1 is questionable, doing this 1,000 times without correction is even more questionable. 6. Your code is equivalent to: tmp <- seq(0,1, by=0.001) tmp2 <- tmp[ abs(tmp-mean(Distribution))/sd(Distribution) > 1.96 ] just slower and less memory efficient. 7. None of this establishes what is from an unknown distribution. If you can tell us what your real question is, then maybe we can help with a real solution. So to answer your question of if it is ok to use z.test in that way: Leagally the license says you can use it anyway you want, ethically/morally/aesthetically/or following the intent of the author, No! -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Atte Tenkanen > Sent: Friday, April 16, 2010 10:11 AM > To: r-help at r-project.org > Subject: [R] Is it ok to apply the z.test this way? > > Dear R-users, > > I want to check if certain values are from random distribution, that > includes values between 0-1. So, it is not really normal even though > shapiro.test says it is highly normal... Can I do something like this > and think that the values given are right. z.test is from package > TeachingDemos. > ----------------------------------------------------------------------- > -------- > SelectedVals=c() > for(i in seq(0,1,by=0.001)) > { > if((z.test(i, mu=mean(Distribution), > stdev=sd(Distribution))$p.value)<=0.05) SelectedVals=c(SelectedVals,i) > } > > ----------------------------------------------------------------------- > -------- > I have marked the border values given by this script to the histogram > of the original random distribution: > > http://www.ag.fimug.fi/~Atte/62Hist100410.pdf > > Atte Tenkanen > University of Turku, Finland > Department of Musicology > +35823335278 > http://users.utu.fi/attenka/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
So .. are you trying to figure out whether your data hasa substantial number of outliers that call into question the adequacy of the normal distro fro your data? If this is the case, note that you cannot individually check the values (as you are doing) without taking into account of the "Bonferoni" fallacy i.e. small p-values will be found with a respectable frequency as the size of the dataset grows (C Robert discusses this in a preprint in arxiv see http://arxiv.org/PS_cache/arxiv/pdf/1002/1002.2080v1.pdf ) So even though you could check each individual point for normality, testing the whole dataset requires that you apply a Bonferoni correction to your z.tests or use outlier.test from package "car" to reduce the amount of code you have to write. Regards, Christos> Date: Fri, 16 Apr 2010 19:11:19 +0300 > From: attenka@utu.fi > To: r-help@r-project.org > Subject: [R] Is it ok to apply the z.test this way? > > Dear R-users, > > I want to check if certain values are from random distribution, that includes values between 0-1. So, it is not really normal even though shapiro.test says it is highly normal... Can I do something like this and think that the values given are right. z.test is from package TeachingDemos. > ------------------------------------------------------------------------------- > SelectedVals=c() > for(i in seq(0,1,by=0.001)) > { > if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)<=0.05) SelectedVals=c(SelectedVals,i) > } > > ------------------------------------------------------------------------------- > I have marked the border values given by this script to the histogram of the original random distribution: > > http://www.ag.fimug.fi/~Atte/62Hist100410.pdf > > Atte Tenkanen > University of Turku, Finland > Department of Musicology > +35823335278 > http://users.utu.fi/attenka/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code._________________________________________________________________ Hotmail: Powerful Free email with security by Microsoft. [[alternative HTML version deleted]]