Dear R-helpers,
I have a huge data-set so need to avoid for loops as much as possible. Can
someone think how I can compute the result in the following example (that uses a
for-loop) using some version of apply instead (or any other similarly
super-efficient function)?
example:
#Suppose a matrix:
m1=cbind(1:5,1:5,1:5)
#The aim is to create a new matrix with every column containing the cumulative
sum of all previous columns.
m2=m1
for(i in 2:ncol(m1)){
m2[,i]=apply(m1[,1:i],1,sum)
}
m2
Many thanks in advance
Eleni Rapsomaniki
Research Associate
Strangeways Research Laboratory
Department of Public Health and Primary Care
University of Cambridge
?
Hi: Is this what you want?> m1=cbind(1:5,1:5,1:5) > apply(m1, 1, cumsum)[,1] [,2] [,3] [,4] [,5] [1,] 1 2 3 4 5 [2,] 2 4 6 8 10 [3,] 3 6 9 12 15 HTH, Dennis On Wed, Apr 14, 2010 at 5:18 AM, Eleni Rapsomaniki <er339@medschl.cam.ac.uk>wrote:> > Dear R-helpers, > > I have a huge data-set so need to avoid for loops as much as possible. Can > someone think how I can compute the result in the following example (that > uses a for-loop) using some version of apply instead (or any other similarly > super-efficient function)? > > example: > #Suppose a matrix: > m1=cbind(1:5,1:5,1:5) > > #The aim is to create a new matrix with every column containing the > cumulative sum of all previous columns. > m2=m1 > for(i in 2:ncol(m1)){ > m2[,i]=apply(m1[,1:i],1,sum) > } > m2 > > Many thanks in advance > > Eleni Rapsomaniki > > Research Associate > Strangeways Research Laboratory > Department of Public Health and Primary Care > University of Cambridge > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
?cumsum can help you
m1 <- cbind(1:5,1:5,1:5)
m2 <- m1
for(i in 2:ncol(m1)){
m2[,i]=apply(m1[,1:i],1,sum)
}
m3 <- t(apply(m1, 1, cumsum))
all.equal(m2, m3)
HTH,
Thierry
----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium
Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than
asking him to perform a post-mortem examination: he may be able to say what the
experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure
that a reasonable answer can be extracted from a given body of data.
~ John Tukey
> -----Oorspronkelijk bericht-----
> Van: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] Namens Eleni Rapsomaniki
> Verzonden: woensdag 14 april 2010 14:18
> Aan: r-help at r-project.org
> Onderwerp: [R] Running cumulative sums in matrices
>
>
> Dear R-helpers,
>
> I have a huge data-set so need to avoid for loops as much as
> possible. Can someone think how I can compute the result in
> the following example (that uses a for-loop) using some
> version of apply instead (or any other similarly
> super-efficient function)?
>
> example:
> #Suppose a matrix:
> m1=cbind(1:5,1:5,1:5)
>
> #The aim is to create a new matrix with every column
> containing the cumulative sum of all previous columns.
> m2=m1
> for(i in 2:ncol(m1)){
> m2[,i]=apply(m1[,1:i],1,sum)
> }
> m2
>
> Many thanks in advance
>
> Eleni Rapsomaniki
>
> Research Associate
> Strangeways Research Laboratory
> Department of Public Health and Primary Care University of Cambridge
> ?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.
you can even use a simple for-loop, e.g.,
m1 <- cbind(1:5,1:5,1:5)
out <- m1
for(i in 1:nrow(out))
out[i, ] <- cumsum(out[i, ])
which seems to be faster than apply(m1, 1, cumsum), i.e.,
m1 <- m1[rep(1:5, each = 1e04), ]
library(rbenchmark)
benchmark(
"apply" = apply(m1, 1, cumsum),
"for" = {out <- m1; for(i in 1:nrow(out)) out[i, ] <-
cumsum(out[i,
])},
replications = 50, order = "relative"
)
I hope it helps.
Best,
Dimitris
On 4/14/2010 2:18 PM, Eleni Rapsomaniki wrote:>
> Dear R-helpers,
>
> I have a huge data-set so need to avoid for loops as much as possible. Can
someone think how I can compute the result in the following example (that uses a
for-loop) using some version of apply instead (or any other similarly
super-efficient function)?
>
> example:
> #Suppose a matrix:
> m1=cbind(1:5,1:5,1:5)
>
> #The aim is to create a new matrix with every column containing the
cumulative sum of all previous columns.
> m2=m1
> for(i in 2:ncol(m1)){
> m2[,i]=apply(m1[,1:i],1,sum)
> }
> m2
>
> Many thanks in advance
>
> Eleni Rapsomaniki
>
> Research Associate
> Strangeways Research Laboratory
> Department of Public Health and Primary Care
> University of Cambridge
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center
Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014
Eleni et. al.:
Perhaps it's worth noting that there is generally NO reason to prefer
apply-family code to explicit for-loops for execution speed. Apply-type
statments **are** essentially disguised loops -- that is, they execute the
loop code repeatedly at the R interpreter level. They do employ some
efficiency tricks to try to do so as fast as possible; but as posts in this
thread have already noted, whether they run faster or slower than explicit
loops is generally code and problem specific. Sometimes yes; sometimes no;
often about the same.
So, for example,
myfun <- function(x){...}
z <- somelist
ans <-lapply(z,myfun)
## and
ans <- vector("list",10)
for(i in seq_len(length(z)))ans[[i]] <- myfun(z[[i]])
should take about the same time.
The main reason to prefer the former instead of the latter is that the
former conforms to R's functional programming paradigm and tends to produce
cleaner, more debuggable, more maintainable code (I realize that this is a
subjective preference with which many may disagree).
When speedup is desired, the key is to move the loop from the interpreted to
the compiled code level via "vectorization", either by making use of
R's
built-in compiled functions (like cumsum), which are generally .Internal or
.Primitive, or to write and call your own compiled code, e.g.via .Call. This
often can make things orders of magnitude faster.
I hope this provides some clarification about an issue that many seem
confused about. If anything I have said is misstated or requires further
clarification, I would appreciate corrections.
Bert Gunter
Genentech Nonclinical Statistics
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Eleni Rapsomaniki
Sent: Wednesday, April 14, 2010 5:18 AM
To: r-help at r-project.org
Subject: [R] Running cumulative sums in matrices
Dear R-helpers,
I have a huge data-set so need to avoid for loops as much as possible. Can
someone think how I can compute the result in the following example (that
uses a for-loop) using some version of apply instead (or any other similarly
super-efficient function)?
example:
#Suppose a matrix:
m1=cbind(1:5,1:5,1:5)
#The aim is to create a new matrix with every column containing the
cumulative sum of all previous columns.
m2=m1
for(i in 2:ncol(m1)){
m2[,i]=apply(m1[,1:i],1,sum)
}
m2
Many thanks in advance
Eleni Rapsomaniki
Research Associate
Strangeways Research Laboratory
Department of Public Health and Primary Care
University of Cambridge
?
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Does this do what you want?
m1 <- cbind(1:5,1:5,1:5)
m2 <- m1
for(i in 2:ncol(m1)){
m2[,i] <- apply(m1[,1:i],1,sum)
}
m2
ut <- diag( ncol(m1) )
ut[upper.tri(ut)] <- 1
m3 <- m1 %*% ut
m3
all.equal(m2,m3)
hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Eleni Rapsomaniki
> Sent: Wednesday, April 14, 2010 6:18 AM
> To: r-help at r-project.org
> Subject: [R] Running cumulative sums in matrices
>
>
> Dear R-helpers,
>
> I have a huge data-set so need to avoid for loops as much as possible.
> Can someone think how I can compute the result in the following example
> (that uses a for-loop) using some version of apply instead (or any
> other similarly super-efficient function)?
>
> example:
> #Suppose a matrix:
> m1=cbind(1:5,1:5,1:5)
>
> #The aim is to create a new matrix with every column containing the
> cumulative sum of all previous columns.
> m2=m1
> for(i in 2:ncol(m1)){
> m2[,i]=apply(m1[,1:i],1,sum)
> }
> m2
>
> Many thanks in advance
>
> Eleni Rapsomaniki
>
> Research Associate
> Strangeways Research Laboratory
> Department of Public Health and Primary Care
> University of Cambridge
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.