Hello R users,
I'm trying to extract random samples from a big array I have.
I have a data frame of over 40k lines and would like to produce around 50
random sample of around 200 lines each from this array.
this is the matrix
ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T
yyy_1c yyy_1c _2c
1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047
2.115634 2.189372 1.643328 1.563523
2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
11.650336 11.995504 13.024494 12.776322
3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775
2.033982 2.111937 1.606340 1.548940
4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279
9.160038 9.104121 10.079177 9.828757
5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356
3.559434 3.156093 4.308437 4.045098
6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068
7.410085 7.022582 7.668616 7.953706
...
I tried to do it with a for loop:
genelist <- read.delim("/user/R/raw_data.txt")
rownames(genelist) <- genelist[,1]
genes <- rownames(genelist)
x <- 1:40000
set <- matrix(nrow = 50, ncol = 11)
for(i in c(1:50)){
set[i] <-sample(x,50)
print(c(i,"->", set), quote = FALSE)
}
which basically do the trick, but I just can't save the results outside the
loop.
After having the random sets of lines it wasn't a problem to extract the
line from the arrays using subset.
genSet1 <-sample(x,50)
random1 <- genes %in% genSet1
subsetGenelist <- subset(genelist, random1)
is there a different way of creating these random vectors or saving the loop
results outside tjhe loop so I cn work with them?
Thanks a lot
Assa
[[alternative HTML version deleted]]
Don't know what exactly you're trying to do, but you make a matrix
with 11 columns and 50 rows, then treat it as a vector. On top of
that, you try to fill 50 rows/columns with 50 values. Off course that
doesn't work. Did you check the warning messages when running the
code?
Either do :
for(i in c(1:11)){
set[,i] <-sample(x,50)
print(c(i,"->", set), quote = FALSE)
}
or
for(i in c(1:50)){
set[i,] <-sample(x,11)
print(c(i,"->", set), quote = FALSE)
}
Or just forget about the loop altogether and do :
set <- replicate(11,sample(x,50))
or
set <- t(replicate(50,sample(x,11)))
cheers
On Thu, Jul 8, 2010 at 8:04 AM, Assa Yeroslaviz <frymor at gmail.com>
wrote:> Hello R users,
>
> I'm trying to extract random samples from a big array I have.
>
> I have a data frame of over 40k lines and would like to produce around 50
> random sample of around 200 lines each from this array.
>
> this is the matrix
> ? ? ? ? ?ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T xxx__8T
> yyy_1c yyy_1c _2c
> 1 A_512 ?2.150295 ?2.681759 ?2.177138 ?2.142790 ?2.115344 ?2.013047
> 2.115634 ?2.189372 ?1.643328 ?1.563523
> 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779
> 11.650336 11.995504 13.024494 12.776322
> 3 A_152 ?2.063276 ?2.160961 ?2.067549 ?2.059732 ?2.656416 ?2.075775
> 2.033982 ?2.111937 ?1.606340 ?1.548940
> 4 A_163 ?9.570761 10.448615 ?9.432859 ?9.732615 10.354234 10.993279
> 9.160038 ?9.104121 10.079177 ?9.828757
> 5 A_184 ?3.574271 ?4.680859 ?4.517047 ?4.047096 ?3.623668 ?3.021356
> 3.559434 ?3.156093 ?4.308437 ?4.045098
> 6 A_199 ?7.593952 ?7.454087 ?7.513013 ?7.449552 ?7.345718 ?7.367068
> 7.410085 ?7.022582 ?7.668616 ?7.953706
> ...
>
> I tried to do it with a for loop:
>
> genelist <- read.delim("/user/R/raw_data.txt")
> rownames(genelist) <- genelist[,1]
> genes <- rownames(genelist)
>
> x <- 1:40000
> set <- matrix(nrow = 50, ncol = 11)
>
> for(i in c(1:50)){
> ? ?set[i] <-sample(x,50)
> ? ?print(c(i,"->", set), quote = FALSE)
> ? ?}
>
> which basically do the trick, but I just can't save the results outside
the
> loop.
> After having the random sets of lines it wasn't a problem to extract
the
> line from the arrays using subset.
>
> genSet1 <-sample(x,50)
> random1 <- genes %in% genSet1
> subsetGenelist <- subset(genelist, random1)
>
>
> is there a different way of creating these random vectors or saving the
loop
> results outside tjhe loop so I cn work with them?
>
> Thanks a lot
>
> Assa
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joris Meys
Statistical consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
On Jul 8, 2010, at 2:04 AM, Assa Yeroslaviz wrote:> Hello R users, > > I'm trying to extract random samples from a big array I have. > > I have a data frame of over 40k lines and would like to produce > around 50 > random sample of around 200 lines each from this array. > > this is the matrix > ID xxx_1c xxx__2c xxx__3c xxx__4c xxx__5T xxx__6T xxx__7T > xxx__8T > yyy_1c yyy_1c _2c > 1 A_512 2.150295 2.681759 2.177138 2.142790 2.115344 2.013047 > 2.115634 2.189372 1.643328 1.563523 > 2 A_134 12.832488 12.596373 12.882581 12.987091 11.956149 11.994779 > 11.650336 11.995504 13.024494 12.776322 > 3 A_152 2.063276 2.160961 2.067549 2.059732 2.656416 2.075775 > 2.033982 2.111937 1.606340 1.548940 > 4 A_163 9.570761 10.448615 9.432859 9.732615 10.354234 10.993279 > 9.160038 9.104121 10.079177 9.828757 > 5 A_184 3.574271 4.680859 4.517047 4.047096 3.623668 3.021356 > 3.559434 3.156093 4.308437 4.045098 > 6 A_199 7.593952 7.454087 7.513013 7.449552 7.345718 7.367068 > 7.410085 7.022582 7.668616 7.953706 > ... > > I tried to do it with a for loop: > > genelist <- read.delim("/user/R/raw_data.txt") > rownames(genelist) <- genelist[,1] > genes <- rownames(genelist) >One method: totsize <- 50 * 200 $ create matrix of indices smatrix <- matrix(sample( 1:length(genelist$ID), totsize), nrow=200, ncol=50) # Then any one sample would be: genelist[ smatrix[,i], ] for i in 1:50. You do need to decide whether this approach which creates 50 mutually exclusive samples (if the ID's are unique) is really what you want, since they are not truly independent draws. I think this could be an issue with a ratio of universe:sample ~ 4:1. It's not a bootstrap sample. Could add replace=TRUE in the sample call to fix that. -- David> x <- 1:40000 > set <- matrix(nrow = 50, ncol = 11) > > for(i in c(1:50)){ > set[i] <-sample(x,50) > print(c(i,"->", set), quote = FALSE) > } > > which basically do the trick, but I just can't save the results > outside the > loop. > After having the random sets of lines it wasn't a problem to extract > the > line from the arrays using subset. > > genSet1 <-sample(x,50) > random1 <- genes %in% genSet1 > subsetGenelist <- subset(genelist, random1) > > > is there a different way of creating these random vectors or saving > the loop > results outside tjhe loop so I cn work with them? > > Thanks a lot > > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT