Christopher David Desjardins
2009-Mar-19 18:08 UTC
[R] Randomly splitting a data frame in half
I have a data frame in long format and I would like to randomly divide this data frame in half. The data frame consists of 39622 rows and I initially tried ... randomsample1 <- data[sample(nrow(data),19811), ] Where allows me to randomly select half of the rows and assign them to randomsample1 but then I couldn't figure out how to select those rows that were not selected and assign them to randomsample2. Please cc me if you reply as I'm a digest subscriber. Thanks, Chris
well, you need to keep track of the rows you sampled, e.g., dat <- data.frame(x = rnorm(20), y = rnorm(20), w = rnorm(20)) ii <- seq_len(nrow(dat)) ind1 <- sample(ii, 10) ind2 <- ii[!ii %in% ind1] dat[ind1, ] dat[ind2, ] I hope it helps. Best, Dimitris Christopher David Desjardins wrote:> I have a data frame in long format and I would like to randomly divide > this data frame in half. The data frame consists of 39622 rows and I > initially tried ... > > randomsample1 <- data[sample(nrow(data),19811), ] > > Where allows me to randomly select half of the rows and assign them to > randomsample1 but then I couldn't figure out how to select those rows > that were not selected and assign them to randomsample2. > > Please cc me if you reply as I'm a digest subscriber. > Thanks, > Chris > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
selected<-sample(nrow(data),19811) randomsample1 <- data[selected,] randomsample2 <- data[-selected,] # for non select But I think is good to have a variable that indicate selected and not selected cases in same data frame. You can try this selected<-rep(0,39622) selected[sample(1:39622,39622/2)]<-1 data$selected<-selected rm(selected) or data$selected<-rbinom(39622,1,.5) select case have the value 1, non-selected have value 0. In the second case, you will not get exactly .5 ! Justin BEM BP 1917 Yaoundé Tél (237) 99597295 (237) 22040246 ________________________________ De : Christopher David Desjardins <cddesjardins@gmail.com> À : r-help@r-project.org Envoyé le : Jeudi, 19 Mars 2009, 19h08mn 48s Objet : [R] Randomly splitting a data frame in half I have a data frame in long format and I would like to randomly divide this data frame in half. The data frame consists of 39622 rows and I initially tried ... randomsample1 <- data[sample(nrow(data),19811), ] Where allows me to randomly select half of the rows and assign them to randomsample1 but then I couldn't figure out how to select those rows that were not selected and assign them to randomsample2. Please cc me if you reply as I'm a digest subscriber. Thanks, Chris ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]