Wiener, Matthew
2005-Apr-21 12:53 UTC
[R] apply vs sapply vs loop - lm() call appl(y)ied on array
Christoph -- There was just a thread on this earlier this week. You can search in the archives for the title: "refitting lm() with same x, different y". (Actually, it doesn't turn up in the R site search yet, at least for me. But if you just go to the archive of recent messages, available through CRAN, you can search on refitting and find it. The original post was from William Valdar, on April 19.) Hope this helps, Matt Wiener -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Christoph Lehmann Sent: Thursday, April 21, 2005 9:24 AM To: R-help at stat.math.ethz.ch Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array Dear useRs (Code of the now mentioned small example is below) I have 7 * 8 * 9 = 504 series of data (each length 5). For each of theses series I want to compute a lm(), where the designmatrx X is the same for all these computations. The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9) means, the first dimension holds the data-series. The lm computation needs performance optimization, since in fact the dimensions are much larger. I compared the following approaches: using a for-loop. using apply, and using sapply. All of these require roughly the same time of computation. I was astonished since I expected at least sapply to outperfomr the for-loop. Do you have me another solution, which is faster? many thanks here is the code ## ------------------------------------------------------ t.length <- 5 d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim) X <- c(1,3,2,4,5) ## -------- performance tests ## using for loop date() z <- rep(0, prod(d.dim[2:4])) l <- 0 for (i in 1:dim(Y)[4]) for (j in 1:dim(Y)[3]) for (k in 1:dim(Y)[2]) { l <- l + 1 z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared } date() ## using apply date() z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared) date() ## using sapply date() fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z <- sapply(split(as.vector(Y), fac), FUN = function(x) unlist(summary(lm(x ~ X)))$r.squared) dim(z) <- d.dim[2:4] date() ## ------------------------------------------------------ -- Christoph Lehmann Phone: ++41 31 930 93 83 Department of Psychiatric Neurophysiology Mobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax: ++41 31 930 99 61 Waldau lehmann at puk.unibe.ch CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_04.html ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Christoph Lehmann
2005-Apr-21 13:23 UTC
[R] apply vs sapply vs loop - lm() call appl(y)ied on array
Dear useRs (Code of the now mentioned small example is below) I have 7 * 8 * 9 = 504 series of data (each length 5). For each of theses series I want to compute a lm(), where the designmatrx X is the same for all these computations. The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9) means, the first dimension holds the data-series. The lm computation needs performance optimization, since in fact the dimensions are much larger. I compared the following approaches: using a for-loop. using apply, and using sapply. All of these require roughly the same time of computation. I was astonished since I expected at least sapply to outperfomr the for-loop. Do you have me another solution, which is faster? many thanks here is the code ## ------------------------------------------------------ t.length <- 5 d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), d.dim) X <- c(1,3,2,4,5) ## -------- performance tests ## using for loop date() z <- rep(0, prod(d.dim[2:4])) l <- 0 for (i in 1:dim(Y)[4]) for (j in 1:dim(Y)[3]) for (k in 1:dim(Y)[2]) { l <- l + 1 z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared } date() ## using apply date() z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared) date() ## using sapply date() fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z <- sapply(split(as.vector(Y), fac), FUN = function(x) unlist(summary(lm(x ~ X)))$r.squared) dim(z) <- d.dim[2:4] date() ## ------------------------------------------------------ -- Christoph Lehmann Phone: ++41 31 930 93 83 Department of Psychiatric Neurophysiology Mobile: ++41 76 570 28 00 University Hospital of Clinical Psychiatry Fax: ++41 31 930 99 61 Waldau lehmann at puk.unibe.ch CH-3000 Bern 60 http://www.puk.unibe.ch/cl/pn_ni_cv_cl_04.html
Christoph Lehmann
2005-Apr-21 15:07 UTC
[R] apply vs sapply vs loop - lm() call appl(y)ied on array
Ok thanks to a hint of Matthew to a former post with a similar request I have now three faster solutions (see below), the last one being the fastest, but the former two also faster than the for-loop, apply(lm(formula)) and sapply(lm(formula)) versions in my last mail: one problem only: using lsfit I can't get directly measures such as r.squared ... --------------- ## using lm with a matrix response (recommended by BDR) date() rsq <-unlist(summary(lm(array(c(Y), dim = c(t.length, prod(d.dim[2:4]))) ~ X)))[seq(22, prod(d.dim[2:4]) * 30, by = 30)] #get r.squared list-element names(rsq) <- prod(d.dim[2:4]) rsq <- array(rsq, dim = d.dim[2:4]) date() ## using sapply and lsfit instead of lm (recommended by Kevin Wright) date() fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) z <- sapply(split(as.vector(Y), fac), FUN = function(x) lsfit(X, x)$coef[2]) dim(z) <- d.dim[2:4] date() ## using lsfit with a matrix response: date() rsq <-lsfit(X, array(c(Y), dim = c(t.length, prod(d.dim[2:4]))))$coef[2,] names(rsq) <- prod(d.dim[2:4]) rsq <- array(rsq, dim = d.dim[2:4]) date() ------------------ thanks Christoph Wiener, Matthew wrote:> Christoph -- > > There was just a thread on this earlier this week. You can search in the > archives for the title: "refitting lm() with same x, different y". > > (Actually, it doesn't turn up in the R site search yet, at least for me. > But if you just go to the archive of recent messages, available through > CRAN, you can search on refitting and find it. The original post was from > William Valdar, on April 19.) > > Hope this helps, > > Matt Wiener > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Christoph Lehmann > Sent: Thursday, April 21, 2005 9:24 AM > To: R-help at stat.math.ethz.ch > Subject: [R] apply vs sapply vs loop - lm() call appl(y)ied on array > > > Dear useRs > > (Code of the now mentioned small example is below) > > I have 7 * 8 * 9 = 504 series of data (each length 5). For each of > theses series I want to compute a lm(), where the designmatrx X is the > same for all these computations. > > The 504 series are in an array of dimension d.dim <- c(5, 7, 8, 9) > means, the first dimension holds the data-series. > > The lm computation needs performance optimization, since in fact the > dimensions are much larger. I compared the following approaches: > > using a for-loop. using apply, and using sapply. All of these require > roughly the same time of computation. I was astonished since I expected > at least sapply to outperfomr the for-loop. > > Do you have me another solution, which is faster? many thanks > > here is the code > ## ------------------------------------------------------ > t.length <- 5 > d.dim <- c(t.length,7,8,9) # dimesions: time, x, y, z > Y <- array( rep(1:t.length, prod(d.dim)) + rnorm(prod(d.dim), 0, 0.1), > d.dim) > X <- c(1,3,2,4,5) > > ## -------- performance tests > ## using for loop > date() > z <- rep(0, prod(d.dim[2:4])) > l <- 0 > for (i in 1:dim(Y)[4]) > for (j in 1:dim(Y)[3]) > for (k in 1:dim(Y)[2]) { > l <- l + 1 > z[l] <- unlist(summary(lm(Y[,k, j, i] ~ X)))$r.squared > } > date() > > ## using apply > date() > z <- apply(Y, 2:4, function(x) unlist(summary(lm(x ~ X)))$r.squared) > date() > > ## using sapply > date() > fac <- rep(1:prod(d.dim[2:4]), rep(t.length, prod(d.dim[2:4]))) > z <- sapply(split(as.vector(Y), fac), FUN = function(x) > unlist(summary(lm(x ~ X)))$r.squared) > dim(z) <- d.dim[2:4] > date() > > ## ------------------------------------------------------ >