Dear R-helpers, I have a huge data-set so need to avoid for loops as much as possible. Can someone think how I can compute the result in the following example (that uses a for-loop) using some version of apply instead (or any other similarly super-efficient function)? example: #Suppose a matrix: m1=cbind(1:5,1:5,1:5) #The aim is to create a new matrix with every column containing the cumulative sum of all previous columns. m2=m1 for(i in 2:ncol(m1)){ m2[,i]=apply(m1[,1:i],1,sum) } m2 Many thanks in advance Eleni Rapsomaniki Research Associate Strangeways Research Laboratory Department of Public Health and Primary Care University of Cambridge ?
Hi: Is this what you want?> m1=cbind(1:5,1:5,1:5) > apply(m1, 1, cumsum)[,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 2 4 6 8 10 [3,] 3 6 9 12 15 HTH, Dennis On Wed, Apr 14, 2010 at 5:18 AM, Eleni Rapsomaniki <er339@medschl.cam.ac.uk>wrote:> > Dear R-helpers, > > I have a huge data-set so need to avoid for loops as much as possible. Can > someone think how I can compute the result in the following example (that > uses a for-loop) using some version of apply instead (or any other similarly > super-efficient function)? > > example: > #Suppose a matrix: > m1=cbind(1:5,1:5,1:5) > > #The aim is to create a new matrix with every column containing the > cumulative sum of all previous columns. > m2=m1 > for(i in 2:ncol(m1)){ > m2[,i]=apply(m1[,1:i],1,sum) > } > m2 > > Many thanks in advance > > Eleni Rapsomaniki > > Research Associate > Strangeways Research Laboratory > Department of Public Health and Primary Care > University of Cambridge > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
?cumsum can help you m1 <- cbind(1:5,1:5,1:5) m2 <- m1 for(i in 2:ncol(m1)){ m2[,i]=apply(m1[,1:i],1,sum) } m3 <- t(apply(m1, 1, cumsum)) all.equal(m2, m3) HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey> -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Namens Eleni Rapsomaniki > Verzonden: woensdag 14 april 2010 14:18 > Aan: r-help at r-project.org > Onderwerp: [R] Running cumulative sums in matrices > > > Dear R-helpers, > > I have a huge data-set so need to avoid for loops as much as > possible. Can someone think how I can compute the result in > the following example (that uses a for-loop) using some > version of apply instead (or any other similarly > super-efficient function)? > > example: > #Suppose a matrix: > m1=cbind(1:5,1:5,1:5) > > #The aim is to create a new matrix with every column > containing the cumulative sum of all previous columns. > m2=m1 > for(i in 2:ncol(m1)){ > m2[,i]=apply(m1[,1:i],1,sum) > } > m2 > > Many thanks in advance > > Eleni Rapsomaniki > > Research Associate > Strangeways Research Laboratory > Department of Public Health and Primary Care University of Cambridge > ? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
you can even use a simple for-loop, e.g., m1 <- cbind(1:5,1:5,1:5) out <- m1 for(i in 1:nrow(out)) out[i, ] <- cumsum(out[i, ]) which seems to be faster than apply(m1, 1, cumsum), i.e., m1 <- m1[rep(1:5, each = 1e04), ] library(rbenchmark) benchmark( "apply" = apply(m1, 1, cumsum), "for" = {out <- m1; for(i in 1:nrow(out)) out[i, ] <- cumsum(out[i, ])}, replications = 50, order = "relative" ) I hope it helps. Best, Dimitris On 4/14/2010 2:18 PM, Eleni Rapsomaniki wrote:> > Dear R-helpers, > > I have a huge data-set so need to avoid for loops as much as possible. Can someone think how I can compute the result in the following example (that uses a for-loop) using some version of apply instead (or any other similarly super-efficient function)? > > example: > #Suppose a matrix: > m1=cbind(1:5,1:5,1:5) > > #The aim is to create a new matrix with every column containing the cumulative sum of all previous columns. > m2=m1 > for(i in 2:ncol(m1)){ > m2[,i]=apply(m1[,1:i],1,sum) > } > m2 > > Many thanks in advance > > Eleni Rapsomaniki > > Research Associate > Strangeways Research Laboratory > Department of Public Health and Primary Care > University of Cambridge > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Eleni et. al.: Perhaps it's worth noting that there is generally NO reason to prefer apply-family code to explicit for-loops for execution speed. Apply-type statments **are** essentially disguised loops -- that is, they execute the loop code repeatedly at the R interpreter level. They do employ some efficiency tricks to try to do so as fast as possible; but as posts in this thread have already noted, whether they run faster or slower than explicit loops is generally code and problem specific. Sometimes yes; sometimes no; often about the same. So, for example, myfun <- function(x){...} z <- somelist ans <-lapply(z,myfun) ## and ans <- vector("list",10) for(i in seq_len(length(z)))ans[[i]] <- myfun(z[[i]]) should take about the same time. The main reason to prefer the former instead of the latter is that the former conforms to R's functional programming paradigm and tends to produce cleaner, more debuggable, more maintainable code (I realize that this is a subjective preference with which many may disagree). When speedup is desired, the key is to move the loop from the interpreted to the compiled code level via "vectorization", either by making use of R's built-in compiled functions (like cumsum), which are generally .Internal or .Primitive, or to write and call your own compiled code, e.g.via .Call. This often can make things orders of magnitude faster. I hope this provides some clarification about an issue that many seem confused about. If anything I have said is misstated or requires further clarification, I would appreciate corrections. Bert Gunter Genentech Nonclinical Statistics -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Eleni Rapsomaniki Sent: Wednesday, April 14, 2010 5:18 AM To: r-help at r-project.org Subject: [R] Running cumulative sums in matrices Dear R-helpers, I have a huge data-set so need to avoid for loops as much as possible. Can someone think how I can compute the result in the following example (that uses a for-loop) using some version of apply instead (or any other similarly super-efficient function)? example: #Suppose a matrix: m1=cbind(1:5,1:5,1:5) #The aim is to create a new matrix with every column containing the cumulative sum of all previous columns. m2=m1 for(i in 2:ncol(m1)){ m2[,i]=apply(m1[,1:i],1,sum) } m2 Many thanks in advance Eleni Rapsomaniki Research Associate Strangeways Research Laboratory Department of Public Health and Primary Care University of Cambridge ? ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Does this do what you want? m1 <- cbind(1:5,1:5,1:5) m2 <- m1 for(i in 2:ncol(m1)){ m2[,i] <- apply(m1[,1:i],1,sum) } m2 ut <- diag( ncol(m1) ) ut[upper.tri(ut)] <- 1 m3 <- m1 %*% ut m3 all.equal(m2,m3) hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Eleni Rapsomaniki > Sent: Wednesday, April 14, 2010 6:18 AM > To: r-help at r-project.org > Subject: [R] Running cumulative sums in matrices > > > Dear R-helpers, > > I have a huge data-set so need to avoid for loops as much as possible. > Can someone think how I can compute the result in the following example > (that uses a for-loop) using some version of apply instead (or any > other similarly super-efficient function)? > > example: > #Suppose a matrix: > m1=cbind(1:5,1:5,1:5) > > #The aim is to create a new matrix with every column containing the > cumulative sum of all previous columns. > m2=m1 > for(i in 2:ncol(m1)){ > m2[,i]=apply(m1[,1:i],1,sum) > } > m2 > > Many thanks in advance > > Eleni Rapsomaniki > > Research Associate > Strangeways Research Laboratory > Department of Public Health and Primary Care > University of Cambridge > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.