thr3ads.net - R help - [R] Sample rows in data frame by subsets [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Chris Stubben

2006-Jan-23 20:04 UTC

[R] Sample rows in data frame by subsets

Hi,

I need to resample rows in a data frame by subsets

L3 <- LETTERS[1:3]
d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
    x  y fac
1  1  1   A
2  1  2   A
3  1  3   A
4  1  4   A
5  1  5   C
6  1  6   C
7  1  7   B
8  1  8   A
9  1  9   C
10 1 10   A

I have seen this used to sample rows with replacement

d[sample(nrow(d), replace=T), ]

     x  y fac
7   1  7   B
2   1  2   A
1   1  1   A
3   1  3   A
2.1 1  2   A
10  1 10   A
8   1  8   A
9   1  9   C
1.1 1  1   A
8.1 1  8   A


but I would like to sample based on the original number in fac

summary(d$fac)
A B C
6 1 3


rbind(subset(d, fac=="A")[sample(6, replace=T), ],
       subset(d, fac=="B")[sample(1, replace=T), ],
       subset(d, fac=="C")[sample(3, replace=T), ] )

     x  y fac
2   1  2   A
3   1  3   A
3.1 1  3   A
1   1  1   A
10  1 10   A
1.1 1  1   A
7   1  7   B
5   1  5   C
6   1  6   C
5.1 1  5   C


Is there an easy way to do this in one step or with a short function?  I 
have lots of dataframes to resample.

Thanks,

Chris


-- 
-----------------
Chris Stubben

Los Alamos National Lab
BioScience Division
MS M888
Los Alamos, NM 87545

Liaw, Andy

2006-Jan-23 20:48 UTC

head link

[R] Sample rows in data frame by subsets

Here's one way, if you want to do it in one command:

do.call("rbind", lapply(split(d, d$fac), function(x) x[sample(nrow(x),
nrow(x), replace=TRUE),]))

split() splits the data into a list of data frames, by d$fac.  The lapply()
call then returns the same list, with the components replaced with the
resample of the original components.  Then just rbind them together.

Andy

From: Chris Stubben> 
> Hi,
> 
> I need to resample rows in a data frame by subsets
> 
> L3 <- LETTERS[1:3]
> d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
>     x  y fac
> 1  1  1   A
> 2  1  2   A
> 3  1  3   A
> 4  1  4   A
> 5  1  5   C
> 6  1  6   C
> 7  1  7   B
> 8  1  8   A
> 9  1  9   C
> 10 1 10   A
> 
> I have seen this used to sample rows with replacement
> 
> d[sample(nrow(d), replace=T), ]
> 
>      x  y fac
> 7   1  7   B
> 2   1  2   A
> 1   1  1   A
> 3   1  3   A
> 2.1 1  2   A
> 10  1 10   A
> 8   1  8   A
> 9   1  9   C
> 1.1 1  1   A
> 8.1 1  8   A
> 
> 
> but I would like to sample based on the original number in fac
> 
> summary(d$fac)
> A B C
> 6 1 3
> 
> 
> rbind(subset(d, fac=="A")[sample(6, replace=T), ],
>        subset(d, fac=="B")[sample(1, replace=T), ],
>        subset(d, fac=="C")[sample(3, replace=T), ] )
> 
>      x  y fac
> 2   1  2   A
> 3   1  3   A
> 3.1 1  3   A
> 1   1  1   A
> 10  1 10   A
> 1.1 1  1   A
> 7   1  7   B
> 5   1  5   C
> 6   1  6   C
> 5.1 1  5   C
> 
> 
> Is there an easy way to do this in one step or with a short 
> function?  I 
> have lots of dataframes to resample.
> 
> Thanks,
> 
> Chris
> 
> 
> -- 
> -----------------
> Chris Stubben
> 
> Los Alamos National Lab
> BioScience Division
> MS M888
> Los Alamos, NM 87545
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2006 - Sample rows in data frame by subsets

[R] Sample rows in data frame by subsets

[R] Sample rows in data frame by subsets

Maybe Matching Threads