Etienne Stockhausen
2010-Jan-12 18:58 UTC
[R] Making routine faster by using apply instead of for-loop
Hey everybody,
I have a small problem with a routine, which prepares some data for
plotting.
I've made a small example:
c=10
mat=data.frame(matrix(1:(c*c),c,c))
row.names(mat)=seq(c,1,length=c)
names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
v=as.numeric(row.names(mat))
w=as.numeric(names(mat))
for(i in 1:c)
{ for(j in 1:c)
{
if(v[j]+w[i]<=c)(mat[i,j]=NA)
}}
This produces exactly the data I need to go on, but if I increase the
constant c ,to for instance 500 , it takes a very long time to set the NA's.
I've heard there is a much faster way to set the NA's using the command
apply( ), but I don't know how.
I'm looking forward for any ideas or hints, that might help me.
Best regards
Etienne
William Dunlap
2010-Jan-12 20:31 UTC
[R] Making routine faster by using apply instead of for-loop
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Etienne Stockhausen > Sent: Tuesday, January 12, 2010 10:59 AM > To: r-help at r-project.org > Subject: [R] Making routine faster by using apply instead of for-loop > > Hey everybody, > > I have a small problem with a routine, which prepares some data for > plotting. > I've made a small example: > > c=10 > mat=data.frame(matrix(1:(c*c),c,c)) > row.names(mat)=seq(c,1,length=c) > names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2)) > v=as.numeric(row.names(mat)) > w=as.numeric(names(mat)) > for(i in 1:c) > { for(j in 1:c) > { > if(v[j]+w[i]<=c)(mat[i,j]=NA) > }} > > This produces exactly the data I need to go on, but if I increase the > constant c ,to for instance 500 , it takes a very long time > to set the NA's.The first problem is that random (element-by-element) access to a data.frame is much slower than the equivalent access to a matrix. Rewriting your code a bit to use a matrix speeds up the c=500 case by a factor of 750. f0 <- function (c = 10) { mat = matrix(1:(c * c), c, c) rownames(mat) = seq(c, 1, length = c) colnames(mat) = c(seq(2, c, length = c/2), seq(c, 2, length = c/2)) v = as.numeric(rownames(mat)) w = as.numeric(colnames(mat)) for (i in 1:c) { for (j in 1:c) { if (v[j] + w[i] <= c) { mat[i, j] = NA } } } mat } Rewriting that to insert the NA's one operation speeds it up by another factor of 10 (in the c=500 case) f1 <- function (c = 10) { v <- seq(c, 1, length = c) w <- c(seq(2, c, length = c/2), seq(c, 2, length = c/2)) mat <- matrix(1:(c * c), nrow = c, ncol = c, dimnames = list(v, w)) mat[outer(w, v, `+`) <= c] <- NA mat } If you really want a matrix, pass the output of these functions into data.frame (with check.names=FALSE since the column names are not considered legal on data.frame: the contain duplicates and look numeric). By the way, it is generally a bad idea to use apply() on a data.frame. It is meant for matrices. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> I've heard there is a much faster way to set the NA's using > the command > apply( ), but I don't know how. > I'm looking forward for any ideas or hints, that might help me. > > Best regards > > Etienne > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Peter Ehlers
2010-Jan-12 20:32 UTC
[R] Making routine faster by using apply instead of for-loop
Your code is doing too many needless things.
The following takes about one second on my slow Vista laptop.
n <- 500
mat <- matrix(1:(n*n), n)
v <- n:1
z <- 2*1:(n/2)
w <- c(z, rev(z))
for(i in seq_len(n)){
for(j in seq_len(n)){
if(v[j] + w[i] <= n)(mat[i,j] <- NA)
}
}
rownames(mat) <- v
colnames(mat) <- w
str(mat)
You end up with matrix, but if you really want a data.frame
with duplicate names, that's easy to get. Do you actually
want those row/col names or are they just used to identify
the cells that get NA?
Depending on what you really need, the following may be
good enough; takes about 0.1 seconds.
n <- 500
mat <- matrix(1:(n*n), n)
for(i in 1:(n/2)){mat[i, -(1:(2*i))] <- mat[n+1-i, -(1:(2*i))] <- NA}
-Peter Ehlers
Etienne Stockhausen wrote:> Hey everybody,
>
> I have a small problem with a routine, which prepares some data for
> plotting.
> I've made a small example:
>
> c=10
> mat=data.frame(matrix(1:(c*c),c,c))
> row.names(mat)=seq(c,1,length=c)
> names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
> v=as.numeric(row.names(mat))
> w=as.numeric(names(mat))
> for(i in 1:c)
> { for(j in 1:c)
> {
> if(v[j]+w[i]<=c)(mat[i,j]=NA)
> }}
>
> This produces exactly the data I need to go on, but if I increase the
> constant c ,to for instance 500 , it takes a very long time to set the
> NA's.
> I've heard there is a much faster way to set the NA's using the
command
> apply( ), but I don't know how.
> I'm looking forward for any ideas or hints, that might help me.
>
> Best regards
>
> Etienne
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Peter Ehlers
University of Calgary
403.202.3921