Dear all; I'm wondering if there is any 'efficient' approach for selecting a sample of 'every nth rows' from a dataframe. For example, let's use the dataframe GAGurine in MASS library:> length(GAGurine[,1])[1] 314 # select an 75% of the dataset, i.e. = 236 rows, every 2 rows starting from row 1> test<-GAGurine[seq(1,314,2),] > length(test[,1])[1] 157 # so, I still need another 79 rows, one way could be: test2<-GAGurine[-seq(1,314,2),]> length(test2[,1])[1] 157> test3<-test2[seq(1,157,2),]# and then final<-rbind(test2,test3)> length(final[,1])[1] 236 Does anyone have a better idea to get the same results but without creating different datasets like test2 and test3? Thanks PM
On Dec 26, 2006, at 12:07 AM, Pedro Mardones wrote:> I'm wondering if there is any 'efficient' approach for selecting a > sample of 'every nth rows' from a dataframe. For example, let's use > the dataframe GAGurine in MASS library: > >> length(GAGurine[,1]) > [1] 314 > > # select an 75% of the dataset, i.e. = 236 rows, every 2 rows starting > from row 1 >> test<-GAGurine[seq(1,314,2),] >> length(test[,1]) > [1] 157 > > # so, I still need another 79 rows, one way could be: > test2<-GAGurine[-seq(1,314,2),] >> length(test2[,1]) > [1] 157 >> test3<-test2[seq(1,157,2),] > > # and then > final<-rbind(test2,test3) >> length(final[,1]) > [1] 236 > > Does anyone have a better idea to get the same results but without > creating different datasets like test2 and test3?A probabilistic approach: len <- length(GAGurine[,1]) GAGu <- GAGurine[sample(1:len, round(.75 * len)), ] # 236 rows A deterministic one: nr <- 1 #or 2 GAGu2 <- GAGurine[-seq(nr, len, 4),] # drop every 4th, giving 235 rows nr <- 3 # or 4 will give 236 rows. _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road Charlottesville, VA 22903 Office: B011 +1-434-982-4729 Lab: B019 +1-434-982-4751 Fax: +1-434-982-4766 WWW: http://www.people.virginia.edu/~mk9y/
You could try something like:> test <- GAGurine[ c(TRUE,TRUE,TRUE,FALSE), ]or if you want a random sequential sample> test <- GAGurine[ sample(c(TRUE,TRUE,TRUE,FALSE)), ]hope this helps, -----Original Message----- From: r-help-bounces@stat.math.ethz.ch on behalf of Pedro Mardones Sent: Mon 12/25/2006 10:07 PM To: R-help@stat.math.ethz.ch Subject: [R] sequential row selection in dataframe Dear all; I'm wondering if there is any 'efficient' approach for selecting a sample of 'every nth rows' from a dataframe. For example, let's use the dataframe GAGurine in MASS library:> length(GAGurine[,1])[1] 314 # select an 75% of the dataset, i.e. = 236 rows, every 2 rows starting from row 1> test<-GAGurine[seq(1,314,2),] > length(test[,1])[1] 157 # so, I still need another 79 rows, one way could be: test2<-GAGurine[-seq(1,314,2),]> length(test2[,1])[1] 157> test3<-test2[seq(1,157,2),]# and then final<-rbind(test2,test3)> length(final[,1])[1] 236 Does anyone have a better idea to get the same results but without creating different datasets like test2 and test3? Thanks PM ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]