Kristiina Hurme
2010-Jun-28 01:16 UTC
[R] sampling one random frame from each unique trial?
hello everyone. please bear with me if this is very easy... I have a data set with many trials, and frames within each trial. I would like to pull out one random frame from each trial. here is an example. i have 4 unique trials (file), and various frames within each (time_pred). I would like to randomly sample 4 rows, but 1 from each trial (file). this sample data is called "h" file time_pred distance_1 distance_2 distance_3 1 12.03.08_ins_odo_01 210 19.003 18.023 14.666 2 12.03.08_ins_odo_01 240 23.905 20.087 17.266 3 12.03.08_ins_odo_01 270 15.694 9.285 4.135 4 12.03.08_ins_odo_02 0 22.142 16.061 14.776 5 12.03.08_ins_odo_02 30 2.968 12.533 19.696 6 12.03.08_ins_odo_02 60 6.175 17.701 20.198 7 12.03.08_ins_odo_02 90 13.668 12.950 13.506 8 12.03.08_ins_odo_02 120 7.098 17.817 22.878 9 12.03.08_ins_odo_02 270 17.252 18.235 18.661 10 12.03.08_ins_odo_02 300 7.967 15.944 8.130 11 12.03.08_ins_odo_03 90 18.724 17.931 21.148 12 12.03.08_ins_odo_03 120 21.220 26.370 23.962 13 12.03.08_ins_odo_03 150 21.225 24.376 20.194 14 12.03.08_ins_odo_03 180 22.298 24.119 24.606 15 12.03.08_ins_odo_03 210 8.413 14.464 15.219 16 12.03.08_ins_odo_03 240 18.117 19.111 19.870 17 12.03.08_ins_odo_07 60 24.063 25.779 24.800 18 12.03.08_ins_odo_07 90 19.790 23.276 18.678 19 12.03.08_ins_odo_07 120 15.617 23.707 19.545 20 12.03.08_ins_odo_07 150 24.818 22.373 24.515 21 12.03.08_ins_odo_07 180 16.301 19.976 25.309 22 12.03.08_ins_odo_07 210 23.843 24.772 26.025 23 12.03.08_ins_odo_07 240 9.029 15.125 20.139 24 12.03.08_ins_odo_07 270 6.533 22.833 23.618 here is my code so far...> random <-for(i in unique(file)){h[sample(1:24,1),]} > randombut this only gives me one sample... and if I try to exclude naming it as random, then nothing comes up. i'm confused and very new to R. please help! many many thanks! kristiina -- View this message in context: http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270396.html Sent from the R help mailing list archive at Nabble.com.
Hi, take the following example and proceed accordingly. Name=c("Miller","Miller","Miller","Miller","Smith","Smith","Smith","Smith") X=rnorm(8) Year=rep(2000:2003,2) d=data.frame(Name,X,Year) #Row indices rows=1:dim(d)[1] #Which Name occupies which rows? #"Name" would be your "file" w=function(x){which(Name%in%unique(x))} samplefrom=tapply(Name,Name,w) #Sample one row index for each Name and #give the data frame d for these row indices f=function(x){sample(x,1)} d[unlist(lapply(samplefrom,f)),] HTH, Daniel -- View this message in context: http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270465.html Sent from the R help mailing list archive at Nabble.com.
Hi: Try this: do.call(rbind, lapply(split(h, h$file), function(x) x[sample(1:nrow(x), 1), ])) My test returns file time_pred distance_1 distance_2 12.03.08_ins_odo_01 12.03.08_ins_odo_01 210 19.003 18.023 12.03.08_ins_odo_02 12.03.08_ins_odo_02 90 13.668 12.950 12.03.08_ins_odo_03 12.03.08_ins_odo_03 120 21.220 26.370 12.03.08_ins_odo_07 12.03.08_ins_odo_07 180 16.301 19.976 distance_3 12.03.08_ins_odo_01 14.666 12.03.08_ins_odo_02 13.506 12.03.08_ins_odo_03 23.962 12.03.08_ins_odo_07 25.309 The function does the following: (1) Splits the data frame into a list, where each component of the list is a sub-data frame. (2) Applies the (anonymous) sampling function to each list component (lapply) (3) Combines the individual outputs together using the rbind function (do.call) Since this is the raison d'etre of the plyr package, one can also use library(plyr)> ddply(d, 'file', function(x) x[sample(1:nrow(x), 1), ])file time_pred distance_1 distance_2 distance_3 1 12.03.08_ins_odo_01 270 15.694 9.285 4.135 2 12.03.08_ins_odo_02 270 17.252 18.235 18.661 3 12.03.08_ins_odo_03 240 18.117 19.111 19.870 4 12.03.08_ins_odo_07 90 19.790 23.276 18.678 (Your results may vary, but you do get one row per file as output.) HTH, Dennis On Sun, Jun 27, 2010 at 6:16 PM, Kristiina Hurme <kristiina.hurme@uconn.edu>wrote:> > hello everyone. please bear with me if this is very easy... > > I have a data set with many trials, and frames within each trial. I would > like to pull out one random frame from each trial. > here is an example. i have 4 unique trials (file), and various frames > within > each (time_pred). I would like to randomly sample 4 rows, but 1 from each > trial (file). > > this sample data is called "h" > file time_pred distance_1 distance_2 > distance_3 > 1 12.03.08_ins_odo_01 210 19.003 18.023 14.666 > 2 12.03.08_ins_odo_01 240 23.905 20.087 17.266 > 3 12.03.08_ins_odo_01 270 15.694 9.285 4.135 > 4 12.03.08_ins_odo_02 0 22.142 16.061 14.776 > 5 12.03.08_ins_odo_02 30 2.968 12.533 19.696 > 6 12.03.08_ins_odo_02 60 6.175 17.701 20.198 > 7 12.03.08_ins_odo_02 90 13.668 12.950 13.506 > 8 12.03.08_ins_odo_02 120 7.098 17.817 22.878 > 9 12.03.08_ins_odo_02 270 17.252 18.235 18.661 > 10 12.03.08_ins_odo_02 300 7.967 15.944 8.130 > 11 12.03.08_ins_odo_03 90 18.724 17.931 21.148 > 12 12.03.08_ins_odo_03 120 21.220 26.370 23.962 > 13 12.03.08_ins_odo_03 150 21.225 24.376 20.194 > 14 12.03.08_ins_odo_03 180 22.298 24.119 24.606 > 15 12.03.08_ins_odo_03 210 8.413 14.464 15.219 > 16 12.03.08_ins_odo_03 240 18.117 19.111 19.870 > 17 12.03.08_ins_odo_07 60 24.063 25.779 24.800 > 18 12.03.08_ins_odo_07 90 19.790 23.276 18.678 > 19 12.03.08_ins_odo_07 120 15.617 23.707 19.545 > 20 12.03.08_ins_odo_07 150 24.818 22.373 24.515 > 21 12.03.08_ins_odo_07 180 16.301 19.976 25.309 > 22 12.03.08_ins_odo_07 210 23.843 24.772 26.025 > 23 12.03.08_ins_odo_07 240 9.029 15.125 20.139 > 24 12.03.08_ins_odo_07 270 6.533 22.833 23.618 > > here is my code so far... > > > random <-for(i in unique(file)){h[sample(1:24,1),]} > > random > > but this only gives me one sample... and if I try to exclude naming it as > random, then nothing comes up. i'm confused and very new to R. please help! > many many thanks! > kristiina > > > -- > View this message in context: > http://r.789695.n4.nabble.com/sampling-one-random-frame-from-each-unique-trial-tp2270396p2270396.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]