thr3ads.net - R help - [R] Permutation or Bootstrap to obtain p-value for one sample [Oct 2011]

If this information is useful, please help other people find it:
Share via:

francy

2011-Oct-08 14:04 UTC

[R] Permutation or Bootstrap to obtain p-value for one sample

Hi, 

I am having trouble understanding how to approach a simulation:

I have a sample of n=250 from a population of N=2,000 individuals, and I
would like to use either permutation test or bootstrap to test whether this
particular sample is significantly different from the values of any other
random samples of the same population. I thought I needed to take random
samples (but I am not sure how many simulations I need to do) of n=250 from
the N=2,000 population and maybe do a one-sample t-test to compare the mean
score of all the simulated samples, + the one sample I am trying to prove
that is different from any others, to the mean value of the population. But
I don't know:
(1) whether this one-sample t-test would be the right way to do it, and how
to go about doing this in R
(2) whether a permutation test or bootstrap methods are more appropriate

This is the data frame that I have, which is to be sampled:
df<-
i.e.
x y
1 2
3 4
5 6
7 8
. .
. .
. .
2,000

I have this sample from df, and would like to test whether it is has extreme
values of y. 
sample1<-
i.e.
x y
3 4
7 8
. .
. .
. .
250

For now I only have this: 

R=999 #Number of simulations, but I don't know how many...
t.values =numeric(R)	 #creates a numeric vector with 999 elements, which
will hold the results of each simulation. 
for (i in 1:R) {
sample1 <- df[sample(nrow(df), 250, replace=TRUE),] 

But I don't know how to continue the loop: do I calculate the mean for each
simulation and compare it to the population mean? 
Any help you could give me would be very appreciated,
Thank you. 


--
View this message in context:
http://r.789695.n4.nabble.com/Permutation-or-Bootstrap-to-obtain-p-value-for-one-sample-tp3885118p3885118.html
Sent from the R help mailing list archive at Nabble.com.

Ken Hutchison

2011-Oct-08 23:27 UTC

head link

[R] Permutation or Bootstrap to obtain p-value for one sample

Hi Francy,
  A bootstrap test would likely be sufficient for this problem, but a
one-sample t-test isn't advisable or necessary in my opinion. If you use a
t-test multiple times you are making assumptions about the distribution of
your data; more importantly, your probability of Type 1 error will be
increased with each test. So, a valid thing to do would be to sample
(computation for this problem won't be expensive so do alotta reps) and
compare your mean to the null distribution of means. I.E.

nreps=10000
mean.dist=rep(NA,nreps)

for(replication in 1:nreps)
{
my.sample=sample(population, 2500, replace=T)
#replace could be false, depends on preference
mean.for.rep=mean(my.sample) #mean for this replication
mean.dist[replication]=mean.for.rep #store the mean
}

hist(mean.dist,main="Null Dist of Means", col="chartreuse")
 #Show the means in a nifty color

You can then perform various tests given the null distribution, or infer
from where your sample mean lies within the distribution or something to
that effect. Also, if the distribution is normal, which is somewhat likely
since it is a distribution of means: (shapiro.test or require(nortest)
ad.test will let you know) you should be able to make inference from that
using parametric methods (once) which will fit the truth a bit better than a
t.test.
        Hope that's helpful,
           Ken Hutchison


On Sat, Oct 8, 2011 at 10:04 AM, francy <francy.casalino@gmail.com> wrote:
> Hi,
>
> I am having trouble understanding how to approach a simulation:
>
> I have a sample of n=250 from a population of N=2,000 individuals, and I
> would like to use either permutation test or bootstrap to test whether this
> particular sample is significantly different from the values of any other
> random samples of the same population. I thought I needed to take random
> samples (but I am not sure how many simulations I need to do) of n=250 from
> the N=2,000 population and maybe do a one-sample t-test to compare the mean
> score of all the simulated samples, + the one sample I am trying to prove
> that is different from any others, to the mean value of the population. But
> I don't know:
> (1) whether this one-sample t-test would be the right way to do it, and how
> to go about doing this in R
> (2) whether a permutation test or bootstrap methods are more appropriate
>
> This is the data frame that I have, which is to be sampled:
> df<-
> i.e.
> x y
> 1 2
> 3 4
> 5 6
> 7 8
> . .
> . .
> . .
> 2,000
>
> I have this sample from df, and would like to test whether it is has
> extreme
> values of y.
> sample1<-
> i.e.
> x y
> 3 4
> 7 8
> . .
> . .
> . .
> 250
>
> For now I only have this:
>
> R=999 #Number of simulations, but I don't know how many...
> t.values =numeric(R)     #creates a numeric vector with 999 elements, which
> will hold the results of each simulation.
> for (i in 1:R) {
> sample1 <- df[sample(nrow(df), 250, replace=TRUE),]
>
> But I don't know how to continue the loop: do I calculate the mean for
each
> simulation and compare it to the population mean?
> Any help you could give me would be very appreciated,
> Thank you.
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Permutation-or-Bootstrap-to-obtain-p-value-for-one-sample-tp3885118p3885118.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

peter dalgaard

2011-Oct-09 07:52 UTC

head link

[R] Permutation or Bootstrap to obtain p-value for one sample

On Oct 8, 2011, at 16:04 , francy wrote:
> Hi, 
> 
> I am having trouble understanding how to approach a simulation:
> 
> I have a sample of n=250 from a population of N=2,000 individuals, and I
> would like to use either permutation test or bootstrap to test whether this
> particular sample is significantly different from the values of any other
> random samples of the same population. I thought I needed to take random
> samples (but I am not sure how many simulations I need to do) of n=250 from
> the N=2,000 population and maybe do a one-sample t-test to compare the mean
> score of all the simulated samples, + the one sample I am trying to prove
> that is different from any others, to the mean value of the population. But
> I don't know:
> (1) whether this one-sample t-test would be the right way to do it, and how
> to go about doing this in R
> (2) whether a permutation test or bootstrap methods are more appropriate
> 
> This is the data frame that I have, which is to be sampled:
> df<-
> i.e.
> x y
> 1 2
> 3 4
> 5 6
> 7 8
> . .
> . .
> . .
> 2,000
> 
> I have this sample from df, and would like to test whether it is has
extreme
> values of y. 
> sample1<-
> i.e.
> x y
> 3 4
> 7 8
> . .
> . .
> . .
> 250
> 
> For now I only have this: 
> 
> R=999 #Number of simulations, but I don't know how many...
> t.values =numeric(R)	 #creates a numeric vector with 999 elements, which
> will hold the results of each simulation. 
> for (i in 1:R) {
> sample1 <- df[sample(nrow(df), 250, replace=TRUE),] 
> 
> But I don't know how to continue the loop: do I calculate the mean for
each
> simulation and compare it to the population mean? 
> Any help you could give me would be very appreciated,
> Thank you. 
The straightforward way would be a permutation test, something like this

msamp <- mean(sample1$y)
mpop <- mean(df$y)
msim <- replicate(10000, mean(sample(df$y, 250)))

sum(abs(msim-mpop) >= abs(msamp-mpop))/10000

I don't really see a reason to do bootstrapping here. You say you want to
test whether your sample could be a random sample from the population, so just
simulate that sampling (which should be without replacement, like your sample
is). Bootstrapping might come in if you want a confidence interval for the mean
difference between your sample and the rest.

Instead of sampling means, you could put a full-blown t-test inside the
replicate expression, like:

psim <- replicate(10000, {s<-sample(1:2000, 250); t.test(df$y[s],
df$y[-s])$p.value})

and then check whether the p value for your sample is small compared to the
distribution of values in psim.

That'll take quite a bit longer, though; t.test() is a more complex beast
than mean(). It is not obvious that it has any benefits either, unless you
specifically wanted to investigate the behavior of the t test.

(All code untested. Caveat emptor.)


-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Seemingly Similar Threads

Search for more maybe matching threads

R help - Oct 2011 - Permutation or Bootstrap to obtain p-value for one sample

[R] Permutation or Bootstrap to obtain p-value for one sample

[R] Permutation or Bootstrap to obtain p-value for one sample

[R] Permutation or Bootstrap to obtain p-value for one sample

Seemingly Similar Threads