On Thursday 18 June 2009, Jonathan Greenberg wrote:> Rers:
>
> What is the preferred library/function for doing stratified random
> sampling from a dataset, given I want to control the number of samples
> (rather than the proportion of samples) per strata? Thanks!
>
> --j
Hi Jonathan!
Check out spsample in the 'sp' package for spatial-stratified random
sampling,
among others.
For grouped data, there may be a function, but it should be as simple as:
# some grouped data, with different means for clarity
d <- data.frame(x=rnorm(1000, mean=c(1,5,10,15)), g=rep(letters[1:4],
times=250))
# sample 2 items (without replacement) from each group:
res <- by(d, d$g, function(i) {sample(i$x, size=2)} )
d$g: a
[1] 0.1931319 2.1858605
------------------------------------------------------------
d$g: b
[1] 6.020904 5.200289
------------------------------------------------------------
d$g: c
[1] 9.61317 11.14428
------------------------------------------------------------
d$g: d
[1] 15.26022 14.61383
# Then, parse the result with lapply or sapply. Or, use the plyr framework to
# extend this to multi-level stratification!
library(lattice)
library(plyr)
# two-levels of grouped data:
d <- data.frame(x=rnorm(1000, mean=c(1,5,100,150)),
g=rep(letters[1:4], times=250),
gg=rep(c('A','B'), each=2, times=250))
# check:
bwplot(x ~ g | gg, data=d)
# use ddply():
res <- ddply(d, .variables=c('gg','g'), .fun=function(i) {
sample(i$x,
size=2)} )
# result looks ok:
gg g V1 V2
1 A a 0.1555472 3.196626
2 A b 4.9836106 5.559472
3 B c 100.0587593 101.723630
4 B d 150.7257066 149.865093
# might need some more work to convert that back into 'long format' for
modeling...
Cheers,
Dylan
--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341