myDF: d1 d2 d3 d4 d5 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.000925938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.000925938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 -0.166910351 0.022304377 -0.00825924 0.008330689 -0.168225938 per the dataframe above, step 1: do the following doit=function(x)c(sum_positive=sum(x[-1][x[-1]>0]),sum_negative=sum(x[-1][x[-1]<0])) pos_neg_pool<-t(apply(myDF,1,doit)) if not first run then append the data to the pos_neg_pool step2: reshuffle the data by columns then do step1, this step need to run 10000 times; output will be 23*10000=230,000 rows. Can anyone point out how to automate this 10000 runs in R? Thanks, -- View this message in context: http://r.789695.n4.nabble.com/re-sampling-of-large-sacle-data-tp2304165p2304165.html Sent from the R help mailing list archive at Nabble.com.
Write a function that incorporates "doit" and the column shuffle. Let's call it "doitbetter" replicate(10000, doitbetter()) You'll probably want to read the help for "replicate" to make sure the defaults are what you want. --Gray On Tue, Jul 27, 2010 at 4:43 PM, jd6688 <jdsignature at gmail.com> wrote:> > myDF: > > d1 ? ? ? ? ? ? ?d2 ? ? ? ? ? ? ?d3 ? ? ? ? ? ? ? ? ? ? ?d4 ? ? ? ? ? ? ? ? ? ? ? ?d5 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.000925938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.000925938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > -0.166910351 ? ?0.022304377 ? ? -0.00825924 ? ? 0.008330689 ? ? -0.168225938 > > > per the dataframe above, > step 1: do the following > > > doit=function(x)c(sum_positive=sum(x[-1][x[-1]>0]),sum_negative=sum(x[-1][x[-1]<0])) > > ? ? ? ? ?pos_neg_pool<-t(apply(myDF,1,doit)) > ? ? ? ? ?if not first run then append the data to the pos_neg_pool > step2: ?reshuffle the data by columns then do step1, this step need to run > 10000 times; > > output will be 23*10000=230,000 rows. > > Can anyone point out how to automate this 10000 runs in R? > > Thanks, > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/re-sampling-of-large-sacle-data-tp2304165p2304165.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Gray Calhoun Assistant Professor of Economics, Iowa State University http://www.econ.iastate.edu/~gcalhoun/
I am trying to do the following to accomplish the tasks, can anybody to simplify the solutions. Thanks, for (i in 1:10000){ d<-apply(s,2,sample) pos_neg_tem<-t(apply(d,1,doit)) if (i>1){ pos_neg_pool<-rbind(pos_neg_pool,pos_neg_tem) }else{ pos_neg_pool<- pos_neg_tem }} -- View this message in context: http://r.789695.n4.nabble.com/re-sampling-of-large-sacle-data-tp2304165p2304221.html Sent from the R help mailing list archive at Nabble.com.
On Jul 27, 2010, at 6:44 PM, jd6688 wrote:> > I am trying to do the following to accomplish the tasks, can anybody > to > simplify the solutions. > > Thanks, > > for (i in 1:10000){ > d<-apply(s,2,sample) > pos_neg_tem<-t(apply(d,1,doit)) > if (i>1){ > pos_neg_pool<-rbind(pos_neg_pool,pos_neg_tem) > > }else{ > > pos_neg_pool<- pos_neg_tem > }}A bit of efficiency advice: incremental creation of objects is generally a major source of slowness. Consider creating pos_neg_pool before the loop and then "filling it in" within the loop. It would also let you remove that "if{}else{}" statement. -- David Winsemius, MD West Hartford, CT
On Jul 28, 2010, at 12:09 AM, jd6688 wrote:> > > d <- apply(s, 2, sample, size = 10000*nrow(s), replace = TRUE) > > why the code above return the following error > Error: cannot allocate vector of size 218.8 MbPossibilities: Your workspace is full of other junk? Your workspace used to be full of other junk and its memory is too fragmented to find a contiguous chunk of memory? Your computer is full of other junk? You have not read the R-FAQ ( or the RW-FAQ ) items on the the topic of memory usage on whatever operating system you are working with. -- David Winsemius, MD West Hartford, CT