Dear R-help, A colleague of mine was running some code on two of our boxes, and noticed a rather large difference in running time. We've so far isolated the problem to the difference between R 1.7.1 and 1.8.0, but not more than that. The exact same code took 933.5 seconds in 1.7.1, and 3594.4 seconds in 1.8.1, on the same box. Basically, the code calls boot() to bootstrap fitting mixture models by calling flexmix() (in the flexmix package) with intercept-only models. The code needs to be strip down further, but I thought some of you might be able to tell what's wrong from the info we have so far: I ran R profiling on the code, under both R 1.7.1 and 1.8.1 (but the performance gap is there in 1.8.0 already). Sorted by "self" part of the output, the top 10 lines are: R 1.7.1 % self % total self seconds total seconds name 10.75 100.36 10.75 100.36 ".Fortran" 8.71 81.32 34.27 319.88 "lm.wfit" 5.43 50.66 81.19 757.88 "FLXfit" 4.30 40.16 4.30 40.16 "^" 4.26 39.80 4.26 39.80 "==" 4.15 38.74 4.99 46.62 "names" 3.90 36.38 20.57 191.98 "initialize" 3.51 32.80 5.37 50.14 "dnorm" 2.29 21.34 4.94 46.14 "hclass" 1.83 17.10 5.85 54.64 "inherits" R 1.8.1: % self % total self seconds total seconds name 6.24 224.26 11.69 420.32 "paste" 5.93 213.04 13.21 474.76 "read.dcf" 4.37 157.24 5.17 185.92 "names" 4.18 150.42 5.53 198.66 "exists" 3.71 133.52 14.66 527.00 "lapply" 3.15 113.16 4.43 159.32 "names<-" 2.98 107.26 2.98 107.26 ".Fortran" 2.57 92.46 5.75 206.82 "seq" 2.54 91.42 3.18 114.14 "seq.default" 2.37 85.24 8.93 320.96 "lm.wfit" The ".Fortran" call took about the same amount of time, as does "lm.wfit", so that's a bit comforting. (The code also fits the same model using mclust, and that's probably where the .Fortran call is from.) What puzzled me are: - Several functions took much longer in 1.8.1; e.g., 1.7.1 1.8.1 "paste" 28 224 "names" 38 157 "names<-" 15 113 "exists" 4 150 - The presence of "read.dcf" in 1.8.1. Where could this be from? Any clues as to why we're seeing this? Best, Andy Andy Liaw, PhD Biometrics Research PO Box 2000, RY33-300 Merck Research Labs Rahway, NJ 07065 mailto:andy_liaw at merck.com 732-594-0820
>>>>> "AndyL" == Liaw, Andy <andy_liaw at merck.com> >>>>> on Fri, 28 Nov 2003 22:19:27 -0500 writes:AndyL> Dear R-help, A colleague of mine was running some AndyL> code on two of our boxes, and noticed a rather large AndyL> difference in running time. We've so far isolated AndyL> the problem to the difference between R 1.7.1 and AndyL> 1.8.0, but not more than that. The exact same code AndyL> took 933.5 seconds in 1.7.1, and 3594.4 seconds in AndyL> 1.8.1, on the same box. I understand your concern. I do believe that it must be some peculiarity of your (or the non-standard packages you used code). Some wild guessing: Looking at your profiling table, and seeing read.dcf() makes me think about library() -- which does call read.dcf() only when called on a package that's not yet attached --- and the bioconductor extra package management -- which for me makes only the first time (!) attachment of bioconductor packages relatively slow. I still have no idea how this could influence bootstrapping, unless you detach() and attach() packages many times. In any case, it might depend on the exact list (and order!?) of packages present in search() when calling your code. A very first step of diagnosis might be to activate trace(read.dcf) trace(library) options(verbose = TRUE) A step further might be to patch read.dcf such that it prints info if(getOption("verbose")) { <<print what I am trying to read>> } Martin
> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch][...]> A very first step of diagnosis might be to activate > trace(read.dcf) > trace(library) > options(verbose = TRUE) > > A step further might be to patch read.dcf such that it prints > info > if(getOption("verbose")) { <<print what I am trying to read>> } > > MartinTo refresh people's memory, the problem is that the identical code took more than twice as long in R-1.8.x than in R-1.7.x. I've managed to strip the code down to the following and still see the same problem. It basically bootstraps mixture models to groups of data. It uses boot() from the boot package and flexmix() with intercept only, in the flexmix package. ============================================fitmix <- function(x, verbosefit=FALSE) { bpunlist <- function(x) { unlisted <- numeric() for(i in 1:length(x at components)) { unlisted <- c(unlisted, unlist(parameters(x, component=i))) } unlisted } unifit <- flexmix(x ~ 1, k = 1, control=list(verbose=0)) unlist(unlist(list(G = 1, proportion = unifit at prior, param=matrix(bpunlist(unifit), nrow=2), bic=BIC(unifit), loglik=unifit at logLik))) } bootmix <- function(data, i) { d <- data d.grps <- split.data.frame(d[i,], d[i, "id"]) comps <- vector("list", length=length(d.grps)) for (j in seq(along=comps)) { thefit <- try(fitmix(d.grps[[j]][,1], verbosefit=FALSE)) comps[[j]] <- if (inherits(thefit, "try-error")) rep(NA, 48) else thefit } return(unlist(comps)) } ## The actual commands: require(flexmix) require(boot) set.seed(76421) x <- rnorm(3e3) id <- gl(10, 300) mydf <- data.frame(x, id) Rprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE), sep=".")) system.time(res <- boot(mydf, bootmix, strata=mydf[,"id"], R=5)) Rprof(NULL) summaryRprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE), sep="."))$by.self[1:20,] ============================================ I ran this on a Xeon 2.4GHz running Mandrake 9.0, R compiled from source. This took 2 seconds in R-1.7.1, with the output: self.time self.pct total.time total.pct "FUN" 0.18 9 0.24 12 "names" 0.16 8 0.16 8 "initialize" 0.10 5 0.80 40 "apply" 0.08 4 0.30 15 "[.data.frame" 0.06 3 0.22 11 "inherits" 0.06 3 0.18 9 "is.null" 0.06 3 0.06 3 "match" 0.06 3 0.22 11 "seq.default" 0.06 3 0.06 3 "structure" 0.06 3 0.08 4 "any" 0.04 2 0.14 7 "el" 0.04 2 0.04 2 "FLXfit" 0.04 2 1.18 59 "lapply" 0.04 2 0.22 11 "lm.wfit" 0.04 2 0.16 8 "names<-.default" 0.04 2 0.04 2 "slot<-" 0.04 2 0.18 9 "<" 0.02 1 0.02 1 "|" 0.02 1 0.02 1 ":" 0.02 1 0.02 1 ... and took over 7 seconds in R-1.8.1, with the output: self.time self.pct total.time total.pct "paste" 0.52 6.7 1.10 14.1 "read.dcf" 0.38 4.9 0.96 12.3 "exists" 0.36 4.6 0.50 6.4 "lapply" 0.36 4.6 1.36 17.4 "names" 0.34 4.4 0.40 5.1 "names<-" 0.32 4.1 0.44 5.6 "inherits" 0.22 2.8 0.34 4.4 "==" 0.20 2.6 0.20 2.6 ".Call" 0.20 2.6 0.20 2.6 "seq" 0.18 2.3 0.50 6.4 "seq.default" 0.18 2.3 0.28 3.6 "dynGet" 0.16 2.1 0.46 5.9 "topenv" 0.16 2.1 0.18 2.3 "unique" 0.16 2.1 1.00 12.8 "apply" 0.14 1.8 0.20 2.6 "as.list" 0.14 1.8 0.18 2.3 ".find.package" 0.14 1.8 1.88 24.1 "sapply" 0.14 1.8 1.54 19.7 "any" 0.12 1.5 1.68 21.5 "initialize" 0.12 1.5 4.98 63.8 I then did the following:> trace(read.dcf, recover)[1] "read.dcf"> res <- boot(mydf, bootmix, strata=mydf[,"id"], R=1)Tracing read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields "Namespace") on entry Enter a frame number, or 0 to exit 1:boot(mydf, bootmix, strata = mydf[, "id"], R = 1) 2:statistic(data, original, ...) 3:try(fitmix(d.grps[[j]][, 1], verbosefit = FALSE)) 4:fitmix(d.grps[[j]][, 1], verbosefit = FALSE) 5:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 6:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 7:flexmix(formula = formula, data = data, k = k, cluster = cluster, model list(F 8:FLXglm() 9:new("FLXmodel", weighted = TRUE, formula = formula, name = paste("FLXglm", famil 10:initialize(value, ...) 11:initialize(value, ...) 12:getClass(Class) 13:.requirePackage(package) 14:trySilent(loadNamespace(package)) 15:eval.parent(call) 16:eval(expr, p) 17:eval(expr, envir, enclos) 18:try(loadNamespace(package)) 19:loadNamespace(package) 20:packageHasNamespace(package, package.lib) 21:read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields "Namespace") I'm really out of ideas at this point. Can anyone see the problem given the above, or at least tell me what to try next? Best, Andy
> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] > "Liaw, Andy" <andy_liaw at merck.com> writes: > > > I'm really out of ideas at this point. Can anyone see the > problem given the > > above, or at least tell me what to try next? > > Hmm. Looking into the actual Rprof.out file is sometimes revealing. > Could you provide a link to it (I assume it might be a bit large for > email).The files aren't that big, but it certainly isn't pretty (not intended for human consumption, I suppose). I don't have a place to post it, so I'm attaching it as .tgz. Hopefully it at least makes it to some... Best, Andy> -- > O__ ---- Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: > (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: > (+45) 35327907 >
Maybe Matching Threads
- plot.lm mislabels points with na.exclude (PR#3750)
- High breakdown/efficiency statistics -- was RE: Rosner's test [Broadcast]
- Problem when making refman.pdf on WinXP (R 1.7.1)
- Cluster Analysis: Density-Based Method
- Mounting Linux Samba Shares on Windows when Active Directory Server is down