Hello, I have a nasty loop that I have to do 11877 times. The only thing that slows it down really is this merge: xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) Any ideas on how to speed it up? The output can't change materially (it works), but I'd like it to go faster. I'm looking at getting around the loop (not shown), but I'm trying to speed up the merge first. I'll post regarding the loop if nothing comes of this post. Here is some information on what type of stuff is going into the merge:> class(ua_rd)[1] "matrix"> dim(ua_rd)[1] 20 2> head(ua_rd)AName rt_date 2007-03-31 "14066.580078125" "2007-04-26" 2007-06-30 "14717" "2007-07-19" 2007-09-30 "15528" "2007-10-25" 2007-12-31 "17609" "2008-01-24" 2008-03-31 "17168" "2008-04-24" 2008-06-30 "17681" "2008-07-17"> class(dt)[1] "character"> length(dt)[1] 1799> dt[1:10][1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04" "2007-04-05" "2007-04-06" "2007-04-07" [9] "2007-04-08" "2007-04-09" thanks, Ben [[alternative HTML version deleted]]
Hi Ben, It seems you merge a matrix and a vector. As far as I understand the first thing merge does is convert these to data.frame. Is it possible to make the preceding steps give data frames? Regards, Kees On Fri, Mar 2, 2012 at 11:24 AM, Ben quant <ccquant at gmail.com> wrote:> > Hello, > > I have a nasty loop that I have to do 11877 times. The only thing that > slows it down really is this merge: > > xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) > > Any ideas on how to speed it up? The output can't change materially (it > works), but I'd like it to go faster. I'm looking at getting around the > loop (not shown), but I'm trying to speed up the merge first. I'll post > regarding the loop if nothing comes of this post. > > Here is some information on what type of stuff is going into the merge: > > > class(ua_rd) > [1] "matrix" > > dim(ua_rd) > [1] 20 ?2 > > head(ua_rd) > ? ? ? ? ? ? ? ? ? AName ? ? ? ? ? ? ?rt_date > 2007-03-31 "14066.580078125" "2007-04-26" > 2007-06-30 "14717" ? ? ? ? ? "2007-07-19" > 2007-09-30 "15528" ? ? ? ? ? "2007-10-25" > 2007-12-31 "17609" ? ? ? ? ? "2008-01-24" > 2008-03-31 "17168" ? ? ? ? ? "2008-04-24" > 2008-06-30 "17681" ? ? ? ? ? "2008-07-17" > > class(dt) > [1] "character" > > length(dt) > [1] 1799 > > dt[1:10] > ?[1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04" > "2007-04-05" "2007-04-06" "2007-04-07" > ?[9] "2007-04-08" "2007-04-09" > > thanks, > > Ben > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, Mar 02, 2012 at 03:24:20AM -0700, Ben quant wrote:> Hello, > > I have a nasty loop that I have to do 11877 times.Are you completely sure about that? I often find my self avoiding loops-by-row by constructing vectors of which rows that fullfil a condition, and then creating new vectors out of that vector. If you elaborate on the problem, perhaps we could find a way to avoid the loops altogether? Mostly as a note to self, I wrote http://code.cjb.net/vectors-instead-of-loop.html, it might be understood by others too, but I'm not sure. -- Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net>
One way to speed up the merge is not to use merge. You can use 'match' to find matching indices and then manually. Does this do what you want:> ua <- read.table(text = ' AName rt_date+ 2007-03-31 "14066.580078125" "2007-04-01" + 2007-06-30 "14717" "2007-04-03" + 2007-09-30 "15528" "2007-10-25" + 2007-12-31 "17609" "2008-04-06" + 2008-03-31 "17168" "2008-04-24" + 2008-06-30 "17681" "2008-04-09"', header = TRUE, as.is = TRUE)> > dt <- c( "2007-03-31" ,"2007-04-01" ,"2007-04-02", "2007-04-03","2007-04-04", + "2007-04-05" ,"2007-04-06" ,"2007-04-07", + "2007-04-08", "2007-04-09")> > # find matching values in ua > indx <- match(dt, ua$rt_date) > > # create new result matrix > xx1 <- cbind(dt, ua[indx,]) > rownames(xx1) <- NULL # delete funny names > xx1dt AName rt_date 1 2007-03-31 NA <NA> 2 2007-04-01 14066.58 2007-04-01 3 2007-04-02 NA <NA> 4 2007-04-03 14717.00 2007-04-03 5 2007-04-04 NA <NA> 6 2007-04-05 NA <NA> 7 2007-04-06 NA <NA> 8 2007-04-07 NA <NA> 9 2007-04-08 NA <NA> 10 2007-04-09 NA <NA>>On Fri, Mar 2, 2012 at 5:24 AM, Ben quant <ccquant@gmail.com> wrote:> Hello, > > I have a nasty loop that I have to do 11877 times. The only thing that > slows it down really is this merge: > > xx1 = merge(dt,ua_rd,by.x=1,by.y= 'rt_date',all.x=T) > > Any ideas on how to speed it up? The output can't change materially (it > works), but I'd like it to go faster. I'm looking at getting around the > loop (not shown), but I'm trying to speed up the merge first. I'll post > regarding the loop if nothing comes of this post. > > Here is some information on what type of stuff is going into the merge: > > > class(ua_rd) > [1] "matrix" > > dim(ua_rd) > [1] 20 2 > > head(ua_rd) > AName rt_date > 2007-03-31 "14066.580078125" "2007-04-26" > 2007-06-30 "14717" "2007-07-19" > 2007-09-30 "15528" "2007-10-25" > 2007-12-31 "17609" "2008-01-24" > 2008-03-31 "17168" "2008-04-24" > 2008-06-30 "17681" "2008-07-17" > > class(dt) > [1] "character" > > length(dt) > [1] 1799 > > dt[1:10] > [1] "2007-03-31" "2007-04-01" "2007-04-02" "2007-04-03" "2007-04-04" > "2007-04-05" "2007-04-06" "2007-04-07" > [9] "2007-04-08" "2007-04-09" > > thanks, > > Ben > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. [[alternative HTML version deleted]]