I want to sample from the ID. For each ID, i want to have 2 set of data. I try the sample() function but it didn't work.> x<-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) > xid v1 V2 1 1 1 12 2 1 2 13 3 1 3 14 4 2 4 15 5 2 5 16 6 2 6 17 7 2 7 18 8 3 8 19 9 3 9 20 10 3 10 21 11 4 11 22 12 4 12 23 -- View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html Sent from the R help mailing list archive at Nabble.com.
Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20110217/0d576df0/attachment.pl>
On Feb 16, 2011, at 11:35 PM, yf wrote:> > I want to sample from the ID. For each ID, i want to have 2 set of > data. I > try the sample() function but it didn't work.You don't say _how_ you used the sample function. You should show what code you used when stating the _something_ "doesn't work". Sample returns a vector of items from objects where length() represents some sensible notion. It does not "sample" a complex object such as a dataframe. For dataframes, length is the number of columns, which doesn't agree very well with most people's notion of cases from which to sample. For selection of rows of a dataframes you need to first create a vector of numeric indices and then use that with "[" idx <- sample(nrow(x), nrow(x)/2) # A random split x[ idx, ] x[ -idx, ]> >> x<-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) >> x > id v1 V2 > 1 1 1 12 > 2 1 2 13 > 3 1 3 14 > 4 2 4 15 > 5 2 5 16 > 6 2 6 17 > 7 2 7 18 > 8 3 8 19 > 9 3 9 20 > 10 3 10 21 > 11 4 11 22 > 12 4 12 23 > -- > View this message in context: http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Hi: A couple more approaches to consider: # Utility function to extract two rows from a data frame # Meant to be applied to each data subset sampler <- function(d) if(nrow(d) > 2) d[sample(1:nrow(d), 2, replace FALSE), ] else d library(plyr)> ddply(x, 'id', sampler)id v1 V2 1 1 2 13 2 1 1 12 3 2 4 15 4 2 6 17 5 3 8 19 6 3 10 21 7 4 11 22 8 4 12 23 library(data.table) dtx <- data.table(x, key = 'id')> dtx[, sampler(.SD), by = 'id']id v1 V2 [1,] 1 1 12 [2,] 1 3 14 [3,] 2 5 16 [4,] 2 7 18 [5,] 3 9 20 [6,] 3 10 21 [7,] 4 11 22 [8,] 4 12 23 HTH, Dennis On Wed, Feb 16, 2011 at 8:35 PM, yf <chang648@umn.edu> wrote:> > I want to sample from the ID. For each ID, i want to have 2 set of data. I > try the sample() function but it didn't work. > > > x<-data.frame(id=c(1,1,1,2,2,2,2,3,3,3,4,4), v1=c(1:12), V2=c(12:23)) > > x > id v1 V2 > 1 1 1 12 > 2 1 2 13 > 3 1 3 14 > 4 2 4 15 > 5 2 5 16 > 6 2 6 17 > 7 2 7 18 > 8 3 8 19 > 9 3 9 20 > 10 3 10 21 > 11 4 11 22 > 12 4 12 23 > -- > View this message in context: > http://r.789695.n4.nabble.com/sampling-tp3310184p3310184.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Seemingly Similar Threads
- convert the sas file into csv in R
- split the data
- sampling from Laplace-Normal
- R - need more memory, or rejection sampling algorithm doesn't work?
- xyplot: Plotting two variables, one as points - the other as line. Can that be done without explicitly using panel functions