thr3ads.net - R help - [R] Is it ok to apply the z.test this way? [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Atte Tenkanen

2010-Apr-16 16:11 UTC

[R] Is it ok to apply the z.test this way?

Dear R-users,

I want to check if certain values are from random distribution, that includes
values between 0-1. So, it is not really normal even though shapiro.test says it
is highly normal... Can I do something like this and think that the values given
are right. z.test is from package TeachingDemos.
-------------------------------------------------------------------------------
SelectedVals=c()
for(i in seq(0,1,by=0.001))
{
	if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution))$p.value)<=0.05)
SelectedVals=c(SelectedVals,i)
}

-------------------------------------------------------------------------------
I have marked the border values given by this script to the histogram of the
original random distribution:

http://www.ag.fimug.fi/~Atte/62Hist100410.pdf

Atte Tenkanen
University of Turku, Finland
Department of Musicology
+35823335278
http://users.utu.fi/attenka/

David Winsemius

2010-Apr-16 18:56 UTC

head link

[R] Is it ok to apply the z.test this way?

On Apr 16, 2010, at 12:11 PM, Atte Tenkanen wrote:
> Dear R-users,
>
> I want to check if certain values are from random distribution, that  
> includes values between 0-1. So, it is not really normal even though  
> shapiro.test says it is highly normal... Can I do something like  
> this and think that the values given are right. z.test is from  
> package TeachingDemos.
>
-------------------------------------------------------------------------------
> SelectedVals=c()
> for(i in seq(0,1,by=0.001))
> {
> 	if((z.test(i, mu=mean(Distribution), stdev=sd(Distribution)) 
> $p.value)<=0.05) SelectedVals=c(SelectedVals,i)
> }
>
You are attempting to do statistics on a single number at a time. If  
you do not immediately appreciate the absurdity of this effort, then  
you should consult a real statistician without delay. There are many  
fine statisticians at your university.

-- 

David Winsemius, MD
West Hartford, CT

Greg Snow

2010-Apr-16 19:07 UTC

head link

[R] Is it ok to apply the z.test this way?

Several points:

1. The Shapiro test does not tell you that something is normal or highly normal,
only that you don't have enough evidence to disprove that the data came from
a normal population (powered for a certain type of deviation from normality).

2. The z.test function is intended to be used as a stepping stone in learning
for students, a simple test with unrealistic assumptions to get the ideas, then
relax the assumptions and learn about t tests and others.

3.  The z test is only used when the population standard deviation is known, you
calculate the sd from the data, that is what t tests are for.

4.  Calculating the hypothesized mean from the data is backwards.

5.  using a sample size of 1 is questionable, doing this 1,000 times without
correction is even more questionable.

6.  Your code is equivalent to:

tmp <- seq(0,1, by=0.001)
tmp2 <- tmp[ abs(tmp-mean(Distribution))/sd(Distribution) > 1.96 ]

just slower and less memory efficient.

7. None of this establishes what is from an unknown distribution.

If you can tell us what your real question is, then maybe we can help with a
real solution.

So to answer your question of if it is ok to use z.test in that way: Leagally
the license says you can use it anyway you want,
ethically/morally/aesthetically/or following the intent of the author, No!

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Atte Tenkanen
> Sent: Friday, April 16, 2010 10:11 AM
> To: r-help at r-project.org
> Subject: [R] Is it ok to apply the z.test this way?
> 
> Dear R-users,
> 
> I want to check if certain values are from random distribution, that
> includes values between 0-1. So, it is not really normal even though
> shapiro.test says it is highly normal... Can I do something like this
> and think that the values given are right. z.test is from package
> TeachingDemos.
> -----------------------------------------------------------------------
> --------
> SelectedVals=c()
> for(i in seq(0,1,by=0.001))
> {
> 	if((z.test(i, mu=mean(Distribution),
> stdev=sd(Distribution))$p.value)<=0.05) SelectedVals=c(SelectedVals,i)
> }
> 
> -----------------------------------------------------------------------
> --------
> I have marked the border values given by this script to the histogram
> of the original random distribution:
> 
> http://www.ag.fimug.fi/~Atte/62Hist100410.pdf
> 
> Atte Tenkanen
> University of Turku, Finland
> Department of Musicology
> +35823335278
> http://users.utu.fi/attenka/
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Christos Argyropoulos

2010-Apr-16 19:22 UTC

head link

[R] Is it ok to apply the z.test this way?

So .. 

are you trying to figure out whether your data hasa substantial number of
outliers that call into question the adequacy of the normal distro fro your
data?

 

If this is the case, note that you cannot individually check the values (as you
are doing) without taking into account of the "Bonferoni" fallacy i.e.
small p-values will be found with a respectable frequency as the size of the
dataset grows (C Robert discusses this in a preprint in arxiv see
http://arxiv.org/PS_cache/arxiv/pdf/1002/1002.2080v1.pdf ) So even though you
could check each individual point for normality, testing the whole dataset
requires that you apply a Bonferoni correction to your z.tests or use
outlier.test from package "car" to reduce the amount of code you have
to write.

 

Regards, 

Christos
 > Date: Fri, 16 Apr 2010 19:11:19 +0300
> From: attenka@utu.fi
> To: r-help@r-project.org
> Subject: [R] Is it ok to apply the z.test this way?
> 
> Dear R-users,
> 
> I want to check if certain values are from random distribution, that
includes values between 0-1. So, it is not really normal even though
shapiro.test says it is highly normal... Can I do something like this and think
that the values given are right. z.test is from package TeachingDemos.
>
-------------------------------------------------------------------------------
> SelectedVals=c()
> for(i in seq(0,1,by=0.001))
> {
> if((z.test(i, mu=mean(Distribution),
stdev=sd(Distribution))$p.value)<=0.05) SelectedVals=c(SelectedVals,i)
> }
> 
>
-------------------------------------------------------------------------------
> I have marked the border values given by this script to the histogram of
the original random distribution:
> 
> http://www.ag.fimug.fi/~Atte/62Hist100410.pdf
> 
> Atte Tenkanen
> University of Turku, Finland
> Department of Musicology
> +35823335278
> http://users.utu.fi/attenka/
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 		 	   		  
_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more maybe matching threads

R help - Apr 2010 - Is it ok to apply the z.test this way?

[R] Is it ok to apply the z.test this way?

[R] Is it ok to apply the z.test this way?

[R] Is it ok to apply the z.test this way?

[R] Is it ok to apply the z.test this way?

Maybe Matching Threads