Aldi Kraja
2006-Dec-30 09:04 UTC
[R] Error: cannot take a sample larger than the population
Hi,
In Splus7 this statement
xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
worked fine, but in R the interpreter reports that the length of the
vector to chose c(0,1,2) is shorter than the size of many times I want
to be selected from the vector c(0,1,2).
Any good reason?
See below the error.
> xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
Error in sample(length(x), size, replace, prob) :
cannot take a sample larger than the population
when 'replace = FALSE'
Execution halted
TIA,
Aldi
--
Chuck Cleland
2006-Dec-30 10:45 UTC
[R] Error: cannot take a sample larger than the population
Aldi Kraja wrote:> Hi, > In Splus7 this statement > xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 )) > worked fine, but in R the interpreter reports that the length of the > vector to chose c(0,1,2) is shorter than the size of many times I want > to be selected from the vector c(0,1,2). > Any good reason? > See below the error. > > > xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 )) > Error in sample(length(x), size, replace, prob) : > cannot take a sample larger than the population > when 'replace = FALSE' > Execution haltedSo why not use replace = TRUE ? xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ), replace=TRUE) table(xlrmN1) xlrmN1 0 1 2 5 373 22 prop.table(table(xlrmN1)) xlrmN1 0 1 2 0.0125 0.9325 0.0550> TIA, > > Aldi > > -- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
Aldi Kraja
2006-Dec-30 15:55 UTC
[R] Error: cannot take a sample larger than the population
Partial Summary and discussion:
====================Thank you to Chao Gai, Chuck Cleland, and Jim Lemon for
their suggestion
to use replace=T in R.
There is a problem though (see below)
In the Splus7, sample is defined as
-------------
sample(x, size = n, replace = F, prob = NULL, n = NULL, ...) where
replace=F
In Splus7
xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
and the
table(xlrmN1)/400
0 1 2
0.02 0.93 0.05
show that "sample" is working exactly as expected based on the prob
vector.
When "sample" is used in Splus7 with replacement we see the following
result:
> xlrmN1 <- sample(c(0,1,2),400 ,replace=T,prob=c(0.02 ,0.93 ,0.05 ))
> table(xlrmN1)/400
0 1 2
0.0125 0.925 0.0625
which I think is working again as expected.
In the R, sample is defined as
---------
sample(x, size, replace = FALSE, prob = NULL)
So the above statement with replace=F did not work (reported error)
but with replace=T produced,
> table(xlrmN1)/400
xlrmN1
0 1 2
0.0200 0.9225 0.0575
which is not exactly the sample with the probabilities provided (0.02,0.93,0.05)
Now let's return to the concept of replace=F and replace=T.
When I ask "sample" to select a sample of 400 from a vector of 3 with
NO replacement, I would think the following
a). create a very large sample from 0, 1, and 2. b). From this large sample,
based on the prob vector select without replacement.
c). As result I expect the probability of selected sample to be exactly the same
with the prob vector (As in Splus7)
When I ask "sample" to select a sample of 400 from a vector of 3 with
replacement, I would think the following
a). create a very large sample from 0, 1, and 2. b). From this large sample,
based on the prob vector select with replacement,
which means some of the previous selected 0, 1, 2 can be selected again.
c). As result I expect the probability of selected sample to be NOT exactly the
same with the prob vector (As in Splus7 and R).
So there are two conclusions: "sample" in R is not working correct, OR
I am missing some precision as a rounding error to produce
prob=c(0.02 ,0.93 ,0.05 ).
Am I misunderstanding the "sample" function in R?
Any suggestions are appreciated.
TIA,
Aldi
Aldi Kraja wrote:
>Hi,
>In Splus7 this statement
>xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
>worked fine, but in R the interpreter reports that the length of the
>vector to chose c(0,1,2) is shorter than the size of many times I want
>to be selected from the vector c(0,1,2).
>Any good reason?
>See below the error.
>
> > xlrmN1 <- sample(c(0,1,2),400 ,prob=c(0.02 ,0.93 ,0.05 ))
>Error in sample(length(x), size, replace, prob) :
> cannot take a sample larger than the population
> when 'replace = FALSE'
>Execution halted
>
>TIA,
>
>Aldi
>
>--
>
>
>
--