thr3ads.net - R help - [R] how to rewrite this without a loop ? [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Stijn Lievens

2004-Nov-18 14:44 UTC

[R] how to rewrite this without a loop ?

Dear Rexperts,

First of all let me say that R is a wonderful and useful piece of 
software.

The only thing is that sometimes it takes me a long time to find out how 
something can be done, especially when aiming to write compact (and 
efficient) code.

For instance, I have the following function (very rudimentary) which 
takes a (very specific) data frame as input and for certain subsets
calculates the rank correlation between two corresponding columns.
The aim is to add all the rank correlations.

<code>
add.fun <- function(perf.data) {
    ss <- 0
    for (i in 0:29) {
    	ss <- ss + cor(subset(perf.data, dataset == i)[3], 
subset(perf.data, dataset == i)[7], method = "kendall")
    }
    ss	
}
</code>

As one can see this function uses a for-loop.  Now chapter 9 of 'An 
introduction to R' tells us that we should avoid for-loops as much as 
possible.

Is there an obvious way to avoid this for-loop is this case ?

I would like to see something in the lines of

(maple style)

<code>
add( seq(FUN(i), i = 0..29) )
</code>

Greetings

Stijn.


-- 
=========================================================================Dept.
of Applied Mathematics and Computer Science, University of Ghent
Krijgslaan 281 - S9, B - 9000 Ghent, Belgium
Phone: +32-9-264.48.91, Fax: +32-9-264.49.95
E-mail: Stijn.Lievens at ugent.be, URL: http://allserv.ugent.be/~slievens/

Stijn Lievens

2004-Nov-18 15:47 UTC

head link

[R] how to rewrite this without a loop ?

Stijn Lievens wrote:> Dear Rexperts,
> 
> First of all let me say that R is a wonderful and useful piece of software.
> 
> The only thing is that sometimes it takes me a long time to find out how 
> something can be done, especially when aiming to write compact (and 
> efficient) code.
> 
> For instance, I have the following function (very rudimentary) which 
> takes a (very specific) data frame as input and for certain subsets
> calculates the rank correlation between two corresponding columns.
> The aim is to add all the rank correlations.
> 
> <code>
> add.fun <- function(perf.data) {
>    ss <- 0
>    for (i in 0:29) {
>        ss <- ss + cor(subset(perf.data, dataset == i)[3], 
> subset(perf.data, dataset == i)[7], method = "kendall")
>    }
>    ss   
> }
> </code>
> 
> As one can see this function uses a for-loop.  Now chapter 9 of 'An 
> introduction to R' tells us that we should avoid for-loops as much as 
> possible.
> 
> Is there an obvious way to avoid this for-loop is this case ?
> 
Using the lapply function in the e-mail of James, I came up with the 
following.

<code>
  sum (as.numeric( lapply( split(perf.data, perf.data$dataset), 
function(x) cor(x[3],x[7],method="kendall") ) ))
</code>

So, first I split the dataframe into a list of dataframes using split,
and using lapply I get a list of correlations, which I convert to
numeric and finally sum up.

I definitely avoided the for-loop in this way, although I am not sure 
whether this is more efficient or not.

Cheers,

Stijn.


> I would like to see something in the lines of
> 
> (maple style)
> 
> <code>
> add( seq(FUN(i), i = 0..29) )
> </code>
> 
> Greetings
> 
> Stijn.
> 
> 

-- 
=========================================================================Dept.
of Applied Mathematics and Computer Science, University of Ghent
Krijgslaan 281 - S9, B - 9000 Ghent, Belgium
Phone: +32-9-264.48.91, Fax: +32-9-264.49.95
E-mail: Stijn.Lievens at ugent.be, URL: http://allserv.ugent.be/~slievens/

Thomas Lumley

2004-Nov-18 16:13 UTC

head link

[R] how to rewrite this without a loop ?

On Thu, 18 Nov 2004, Stijn Lievens wrote:>
> <code>
> add.fun <- function(perf.data) {
>   ss <- 0
>   for (i in 0:29) {
>   	ss <- ss + cor(subset(perf.data, dataset == i)[3], subset(perf.data, 
> dataset == i)[7], method = "kendall")
>   }
>   ss	}
> </code>
>
> As one can see this function uses a for-loop.  Now chapter 9 of 'An 
> introduction to R' tells us that we should avoid for-loops as much as 
> possible.

You don't say whether `dataset' is the name of a column in
`perf.data'.
Assuming it is, and assuming that 0:29 are all the values of `dataset'

sum(by(perf.data, list(perf.data$dataset),
           function(d)  cor(d[,3],d[,7], method="kendall")))

would work.  If this is faster it will be because you don't call subset() 
twice per iteration, rather than because you are avoiding a loop.  However 
it has other benefits: it doesn't have the variable `i', it doesn't
have
to change the value of `ss', and it doesn't have the range of
`dataset'
hard-coded into it.  These are all clarity optimisations.

 	-thomas

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Nov 2004 - how to rewrite this without a loop ?

[R] how to rewrite this without a loop ?

[R] how to rewrite this without a loop ?

[R] how to rewrite this without a loop ?

Seemingly Similar Threads