thr3ads.net - R help - [R] Making routine faster by using apply instead of for-loop [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Etienne Stockhausen

2010-Jan-12 18:58 UTC

[R] Making routine faster by using apply instead of for-loop

Hey everybody,

I have a small problem with a routine, which prepares some data for
plotting.
I've made a small example:

    c=10
    mat=data.frame(matrix(1:(c*c),c,c))
    row.names(mat)=seq(c,1,length=c)
    names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
    v=as.numeric(row.names(mat))
    w=as.numeric(names(mat))
    for(i in 1:c)
    { for(j in 1:c)
    {
    if(v[j]+w[i]<=c)(mat[i,j]=NA)
    }}

This produces exactly the data I need to go on, but if I increase the
constant c ,to for instance 500 , it takes a very long time to set the NA's.
I've heard there is a much faster way to set the NA's using the command
apply( ), but I don't know how.
I'm looking forward for any ideas or hints, that might help me.

Best regards

Etienne

William Dunlap

2010-Jan-12 20:31 UTC

head link

[R] Making routine faster by using apply instead of for-loop

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Etienne Stockhausen
> Sent: Tuesday, January 12, 2010 10:59 AM
> To: r-help at r-project.org
> Subject: [R] Making routine faster by using apply instead of for-loop
> 
> Hey everybody,
> 
> I have a small problem with a routine, which prepares some data for
> plotting.
> I've made a small example:
> 
>     c=10
>     mat=data.frame(matrix(1:(c*c),c,c))
>     row.names(mat)=seq(c,1,length=c)
>     names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
>     v=as.numeric(row.names(mat))
>     w=as.numeric(names(mat))
>     for(i in 1:c)
>     { for(j in 1:c)
>     {
>     if(v[j]+w[i]<=c)(mat[i,j]=NA)
>     }}
> 
> This produces exactly the data I need to go on, but if I increase the
> constant c ,to for instance 500 , it takes a very long time 
> to set the NA's.
The first problem is that random (element-by-element)
access to a data.frame is much slower than the equivalent
access to a matrix.  Rewriting your code a bit to
use a matrix speeds up the c=500 case by a factor of 750.
f0 <- function (c = 10)  {
    mat = matrix(1:(c * c), c, c)
    rownames(mat) = seq(c, 1, length = c)
    colnames(mat) = c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
    v = as.numeric(rownames(mat))
    w = as.numeric(colnames(mat))
    for (i in 1:c) {
        for (j in 1:c) {
            if (v[j] + w[i] <= c) {
                mat[i, j] = NA
            }
        }
    }
    mat
}
Rewriting that to insert the NA's one operation speeds it up by
another factor of 10 (in the c=500 case)
f1 <- function (c = 10) {
    v <- seq(c, 1, length = c)
    w <- c(seq(2, c, length = c/2), seq(c, 2, length = c/2))
    mat <- matrix(1:(c * c), nrow = c, ncol = c, dimnames = list(v, 
        w))
    mat[outer(w, v, `+`) <= c] <- NA
    mat
}

If you really want a matrix, pass the output of these functions
into data.frame (with check.names=FALSE since the column
names are not considered legal on data.frame: the contain
duplicates and look numeric).

By the way, it is generally a bad idea to use apply() on
a data.frame.  It is meant for matrices.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 
> I've heard there is a much faster way to set the NA's using 
> the command
> apply( ), but I don't know how.
> I'm looking forward for any ideas or hints, that might help me.
> 
> Best regards
> 
> Etienne
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Peter Ehlers

2010-Jan-12 20:32 UTC

head link

[R] Making routine faster by using apply instead of for-loop

Your code is doing too many needless things.
The following takes about one second on my slow Vista laptop.

n <- 500
mat <- matrix(1:(n*n), n)
v <- n:1
z <- 2*1:(n/2)
w <- c(z, rev(z))
for(i in seq_len(n)){
   for(j in seq_len(n)){
     if(v[j] + w[i] <= n)(mat[i,j] <- NA)
   }
}
rownames(mat) <- v
colnames(mat) <- w

str(mat)

You end up with matrix, but if you really want a data.frame
with duplicate names, that's easy to get. Do you actually
want those row/col names or are they just used to identify
the cells that get NA?

Depending on what you really need, the following may be
good enough; takes about 0.1 seconds.

n <- 500
mat <- matrix(1:(n*n), n)
for(i in 1:(n/2)){mat[i, -(1:(2*i))] <- mat[n+1-i, -(1:(2*i))] <- NA}

  -Peter Ehlers

Etienne Stockhausen wrote:> Hey everybody,
> 
> I have a small problem with a routine, which prepares some data for
> plotting.
> I've made a small example:
> 
>    c=10
>    mat=data.frame(matrix(1:(c*c),c,c))
>    row.names(mat)=seq(c,1,length=c)
>    names(mat)=c(seq(2,c,length=c/2),seq(c,2,length=c/2))
>    v=as.numeric(row.names(mat))
>    w=as.numeric(names(mat))
>    for(i in 1:c)
>    { for(j in 1:c)
>    {
>    if(v[j]+w[i]<=c)(mat[i,j]=NA)
>    }}
> 
> This produces exactly the data I need to go on, but if I increase the
> constant c ,to for instance 500 , it takes a very long time to set the 
> NA's.
> I've heard there is a much faster way to set the NA's using the
command
> apply( ), but I don't know how.
> I'm looking forward for any ideas or hints, that might help me.
> 
> Best regards
> 
> Etienne
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
-- 
Peter Ehlers
University of Calgary
403.202.3921

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jan 2010 - Making routine faster by using apply instead of for-loop

[R] Making routine faster by using apply instead of for-loop

[R] Making routine faster by using apply instead of for-loop

[R] Making routine faster by using apply instead of for-loop

Apparently Analagous Threads