thr3ads.net - R help - [R] sampling from data.frame [Dec 2008]

If this information is useful, please help other people find it:
Share via:

axionator

2008-Dec-02 23:27 UTC

[R] sampling from data.frame

Hi all,
I have a data frame with "clustered" rows as follows:
Cu1  x1 y1 z1 ...
Cu1  x2 y2 z2 ...
Cu1  x3 y3 z3 ... # end of first cluster Cu1
Cu2  x4 y4 z4 ...
Cu2  x5 y5 z5
Cu2  ...               # end of second cluster Cu2
Cu3 ...
...
"cluster"-size is 3 in the example above (rows making up a cluster are
always consecutive). Is there any faster way to sample n clusters
(with replacement) from this dataframe and build up a new data frame
out of these sampled clusters? I use the "sample" function and a
for-loop.

Thanks in advance
Armin

jim holtman

2008-Dec-02 23:53 UTC

head link

[R] sampling from data.frame

Not sure exactly what you mean by 'sample' since you did not provide
an example of the expected output, or input data that could be used.
Here is an example of taking one sample from each cluster:
> df <- data.frame(id=paste("C", rep(1:5, each=3),
sep=''), data=1:15)
> # sample 1 from each cluster
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){+     df[sample(.indx, 1),]
+ })> do.call(rbind,result)   id data
C1 C1    1
C2 C2    4
C3 C3    9
C4 C4   11
C5 C5   15>
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){+     df[sample(.indx, 1),]
+ })> do.call(rbind,result)   id data
C1 C1    2
C2 C2    6
C3 C3    9
C4 C4   11
C5 C5   15>
>
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){+     df[sample(.indx, 1),]
+ })> do.call(rbind,result)   id data
C1 C1    3
C2 C2    4
C3 C3    8
C4 C4   10
C5 C5   13>

On Tue, Dec 2, 2008 at 6:27 PM, axionator <axionator at gmail.com>
wrote:> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1  x1 y1 z1 ...
> Cu1  x2 y2 z2 ...
> Cu1  x3 y3 z3 ... # end of first cluster Cu1
> Cu2  x4 y4 z4 ...
> Cu2  x5 y5 z5
> Cu2  ...               # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a
cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.
>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Charles C. Berry

2008-Dec-03 01:19 UTC

head link

[R] sampling from data.frame

On Wed, 3 Dec 2008, axionator wrote:
> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1  x1 y1 z1 ...
> Cu1  x2 y2 z2 ...
> Cu1  x3 y3 z3 ... # end of first cluster Cu1
> Cu2  x4 y4 z4 ...
> Cu2  x5 y5 z5
> Cu2  ...               # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a
cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.
Something like this:

cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE )

do.call( rbind, cl.samps )

If you need to identify the samples from which the rows came (versus just 
the originating clusters):

cl.samps2 <- lapply( seq(along=cl.samps),
 	function(x) cbind( cl.samps[[ x ]], new.cluster = x ) )

do.call( rbind, cl.samps2 )

HTH,

Chuck
>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

axionator

2008-Dec-03 10:45 UTC

head link

[R] sampling from data.frame

sorry for being a little bit imprecise, Chuck interpreted my question
as desired.

Armin

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Dec 2008 - sampling from data.frame

[R] sampling from data.frame

[R] sampling from data.frame

[R] sampling from data.frame

[R] sampling from data.frame

Apparently Analagous Threads