Hello R users, I'm trying to extract random samples from a big array I have. I have a data frame of over 40k lines and would like to produce around 50 random sample of around 200 lines each from this array. this is the matrix ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T yyy_1c yyy_1c _2c 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 2.115634 2.189372 1.643328 1.563523 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 11.650336 11.995504 13.024494 12.776322 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 2.033982 2.111937 1.606340 1.548940 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 9.160038 9.104121 10.079177 9.828757 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 3.559434 3.156093 4.308437 4.045098 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 7.410085 7.022582 7.668616 7.953706 ... I tried to do it with a for loop: genelist <- read.delim("/user/R/raw_data.txt") rownames(genelist) <- genelist[,1] genes <- rownames(genelist) x <- 1:40000 set <- matrix(nrow = 50, ncol = 11) for(i in c(1:50)){ set[i] <-sample(x,50) print(c(i,"->", set), quote = FALSE) } which basically do the trick, but I just can't save the results outside the loop. After having the random sets of lines it wasn't a problem to extract the line from the arrays using subset. genSet1 <-sample(x,50) random1 <- genes %in% genSet1 subsetGenelist <- subset(genelist, random1) is there a different way of creating these random vectors or saving the loop results outside tjhe loop so I cn work with them? Thanks a lot Assa [[alternative HTML version deleted]]
Don't know what exactly you're trying to do, but you make a matrix with 11 columns and 50 rows, then treat it as a vector. On top of that, you try to fill 50 rows/columns with 50 values. Off course that doesn't work. Did you check the warning messages when running the code? Either do : for(i in c(1:11)){ set[,i] <-sample(x,50) print(c(i,"->", set), quote = FALSE) } or for(i in c(1:50)){ set[i,] <-sample(x,11) print(c(i,"->", set), quote = FALSE) } Or just forget about the loop altogether and do : set <- replicate(11,sample(x,50)) or set <- t(replicate(50,sample(x,11))) cheers On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz <frymor at gmail.com> wrote:> Hello R users, > > I'm trying to extract random samples from a big array I have. > > I have a data frame of over 40k lines and would like to produce around 50 > random sample of around 200 lines each from this array. > > this is the matrix > ? ? ? ? ?ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T > yyy_1c yyy_1c _2c > 1 A_512 ?2.150295 ?2.681759 ?2.177138 ?2.142790 ?2.115344 ?2.013047 > 2.115634 ?2.189372 ?1.643328 ?1.563523 > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 > 11.650336 11.995504 13.024494 12.776322 > 3 A_152 ?2.063276 ?2.160961 ?2.067549 ?2.059732 ?2.656416 ?2.075775 > 2.033982 ?2.111937 ?1.606340 ?1.548940 > 4 A_163 ?9.570761 10.448615 ?9.432859 ?9.732615 10.354234 10.993279 > 9.160038 ?9.104121 10.079177 ?9.828757 > 5 A_184 ?3.574271 ?4.680859 ?4.517047 ?4.047096 ?3.623668 ?3.021356 > 3.559434 ?3.156093 ?4.308437 ?4.045098 > 6 A_199 ?7.593952 ?7.454087 ?7.513013 ?7.449552 ?7.345718 ?7.367068 > 7.410085 ?7.022582 ?7.668616 ?7.953706 > ... > > I tried to do it with a for loop: > > genelist <- read.delim("/user/R/raw_data.txt") > rownames(genelist) <- genelist[,1] > genes <- rownames(genelist) > > x <- 1:40000 > set <- matrix(nrow = 50, ncol = 11) > > for(i in c(1:50)){ > ? ?set[i] <-sample(x,50) > ? ?print(c(i,"->", set), quote = FALSE) > ? ?} > > which basically do the trick, but I just can't save the results outside the > loop. > After having the random sets of lines it wasn't a problem to extract the > line from the arrays using subset. > > genSet1 <-sample(x,50) > random1 <- genes %in% genSet1 > subsetGenelist <- subset(genelist, random1) > > > is there a different way of creating these random vectors or saving the loop > results outside tjhe loop so I cn work with them? > > Thanks a lot > > Assa > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
On Jul 8, 2010, at 2:04 AM, Assa Yeroslaviz wrote:> Hello R users, > > I'm trying to extract random samples from a big array I have. > > I have a data frame of over 40k lines and would like to produce > around 50 > random sample of around 200 lines each from this array. > > this is the matrix > ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T > xxx__8T > yyy_1c yyy_1c _2c > 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 > 2.115634 2.189372 1.643328 1.563523 > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 > 11.650336 11.995504 13.024494 12.776322 > 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 > 2.033982 2.111937 1.606340 1.548940 > 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 > 9.160038 9.104121 10.079177 9.828757 > 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 > 3.559434 3.156093 4.308437 4.045098 > 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 > 7.410085 7.022582 7.668616 7.953706 > ... > > I tried to do it with a for loop: > > genelist <- read.delim("/user/R/raw_data.txt") > rownames(genelist) <- genelist[,1] > genes <- rownames(genelist) >One method: totsize <- 50 * 200 $ create matrix of indices smatrix <- matrix(sample( 1:length(genelist$ID), totsize), nrow=200, ncol=50) # Then any one sample would be: genelist[ smatrix[,i], ] for i in 1:50. You do need to decide whether this approach which creates 50 mutually exclusive samples (if the ID's are unique) is really what you want, since they are not truly independent draws. I think this could be an issue with a ratio of universe:sample ~ 4:1. It's not a bootstrap sample. Could add replace=TRUE in the sample call to fix that. -- David> x <- 1:40000 > set <- matrix(nrow = 50, ncol = 11) > > for(i in c(1:50)){ > set[i] <-sample(x,50) > print(c(i,"->", set), quote = FALSE) > } > > which basically do the trick, but I just can't save the results > outside the > loop. > After having the random sets of lines it wasn't a problem to extract > the > line from the arrays using subset. > > genSet1 <-sample(x,50) > random1 <- genes %in% genSet1 > subsetGenelist <- subset(genelist, random1) > > > is there a different way of creating these random vectors or saving > the loop > results outside tjhe loop so I cn work with them? > > Thanks a lot > > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT