On 13-12-19 6:37 PM, Ross Boylan wrote:> My code seems to be spending most of its time in assignment statements,
> in some cases simple assignment of a model frame or model matrix.
>
> Can anyone provide any insights into what's going on, or how to speed
> things up?
You are seeing a lot of time being spent on complex assignments. For
example, line 158 is
data(sims.c1[[k]]) <- sp
That makes a function call to `data<-` to do the assignment, and that
could be slow. Since it's an S4 method there's a bunch of machinery
involved in dispatching it; most of that would not have line number
information, so it'll be charged to that line.
I can't really suggest how to speed it up.
Duncan Murdoch
>
> For starters, is it possible that the reports are not accurate, or that
> I am misreading them. In R 3.0.1 (running under ESS):
> > Rprof(line.profiling=TRUE)
> > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4))
> user system elapsed
> 21.629 0.756 22.469
> !> Rprof(NULL)
> > summaryRprof(lines="both")
> $by.self
> self.time self.pct total.time total.pct
> box.R#158 6.74 29.56 13.06 57.28
> simulator.multinomial.R#64 2.92 12.81 2.96 12.98
> simulator.multinomial.R#63 2.76 12.11 2.76 12.11
> box.R#171 2.54 11.14 5.08 22.28
> simulator.d1.R#70 0.98 4.30 0.98 4.30
> simulator.d1.R#71 0.98 4.30 0.98 4.30
> densMap.R#42 0.72 3.16 0.86 3.77
> "standardGeneric" 0.52 2.28 11.30
49.56
> ......
>
> Here's some of the code, with comments at the line numbers
> box.R:
> sp <- merge(sexpartner, data, by="studyidx")
> sp$y <- numFactor(sp$pEthnic) #I think y is not used
but must be present
> data(sims.c1[[k]]) <- sp ###<<<<<
line 158
> sp0 <- sp
> sp <- sim(sims.c1[[k]], i)
> ctable[[k]] <- update.c1(ctable[[k]], sp)
> if (is.null(i.c1.in)) {
> i.c1.in <- match("pEthnic",
colnames(sp0))
> i.c1.out <- match(c("studyidx",
"n", "pEthnic"), colnames(sp))
> }
> sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out],
by=c("studyidx", "n"))
> # d1
> sp0 <- sp0[sp0$pIsMale == 1,]
> # avoid lots of conversion warnings
> sp0$pEthnic <- factor(sp0$pEthnic,
levels=partRaceLevels)
> data(sims.d1[[k]]) <- sp0 ###<<<<<
line 171
> sp <- sim(sims.d1[[k]], i)
> dtable[[k]] <- update.d1(dtable[[k]], sp)
> rngstate[[k]] <- .Random.seed
> The timing seems odd since it doesn't appear there's anything to do
at
> the 2 lines except invoke data<-, but if that's slow I would expect
the
> time to go to the data<- function (in a different file) and not to the
> call.
>
> In fact the other big time items are inside the data<- functions.
> simulator.multinomial.R:
>
> setMethod("data<-", c("simulator.multinomial",
"data.frame"),
> function(obj, value) {
> mf <- model.frame(obj at dataFormula, data=value)
> mf$iCluster <- fromOrig(obj at idmap, as.character(mf$studyidx))
> if (any(is.na(mf$iCluster)))
> stop("New studyidx--need to draw from meta distn")
> mm <- model.matrix(obj at modelFormula, data=mf)
> obj at data <- mf ##<<< line 63
> obj at mm <- mm ##<<< line 64
> return(obj)
> })
>
> The mm and data slots have type restrictions, but no other validation
> tests.
> setClass("simulator.multinomial",
> representation(fit="stanfit", idmap="sIDMap",
> modelFormula="formula",
> categories="ANY", # could be factor or
character
> # categories should be in the
order of their numeric codes in y
> # cached results
> coef="list",
> data="data.frame",
> dataFormula="formula",
> mm="matrix"))
> Does it matter that, e.g., a model frame is more than a vanilla data frame?
>
> I thought assignment, given R's lazy copying behavior, was essentially
> resetting a pointer, and so should be fast.
>
> Or maybe the time is going to garbage collecting the previous contents
> of the slots?
>
> Ross Boylan
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>