Jose Quesada
2007-Jan-23 20:46 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
(Extremely sorry, disregard previous email as I hit send before pasting the
latest version of the example; this one is smaller too)
Dear R users,
I want to apply a function that takes two vectors as input to all pairs
(combinations (nrow(X), 2))of matrix rows in a matrix.
I know that ideally, one should avoid loops in R, but after reading the docs for
do.call, apply, etc, I still don't know how to write the nested loop in a
vectorized way.
Example data:
x = matrix(rnorm(100), 10, 10)
# this is actually a very large sparse matrix, but it doesn't matter for the
# example
library(Matrix)
x = as(x,"CsparseMatrix")
# cosine function
cosine = function (x, y){
if (is.vector(x) && is.vector(y)) {
return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y)))
} else {stop("cosine: argument mismatch. Two vectors needed as
input.")}
}
# The loop-based solution I have is:
if (is(x, "Matrix") ) {
cos = array(NA, c(ncol(x), ncol(x)))
for (i in 2:ncol(x)) {
for (j in 1:(i - 1)) {
cos[i, j] = cosine(x[, i], x[, j])
}
}
}
This solution seems inneficient. Is there an easy way of achieving this with a
clever do.call + apply combination?
Also, I have noticed that getting a row from a Matrix object produces a normal
array (i.e., it does not inherit Matrix class). However, selecting >1 rows,
does
produce a same-class matrix. If I convert with as() the output of selecting one
row, am I losing performance? Is there any way to make the resulting vector be a
1-D Matrix object?
This solution seems inneficient. Is there an easy way of achieving this with a
clever do.call + apply combination?
--
Thanks in advance,
-Jose
--
Jose Quesada, PhD
Research fellow, Psychology Dept.
Sussex University, Brighton, UK
http://www.andrew.cmu.edu/~jquesada
Charles C. Berry
2007-Jan-23 23:20 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
I am rusty on 'Matrix', but I see there are crossprod methods for those classes. res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so something like diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) final.res <- diagnl %*% res %*% diagnl should do it. On Tue, 23 Jan 2007, Jose Quesada wrote:> (Extremely sorry, disregard previous email as I hit send before pasting the latest version of the example; this one is smaller too) > Dear R users, > > I want to apply a function that takes two vectors as input to all pairs > (combinations (nrow(X), 2))of matrix rows in a matrix. > I know that ideally, one should avoid loops in R, but after reading the docs for > do.call, apply, etc, I still don't know how to write the nested loop in a > vectorized way. > > Example data: > x = matrix(rnorm(100), 10, 10) > # this is actually a very large sparse matrix, but it doesn't matter for the > # example > library(Matrix) > x = as(x,"CsparseMatrix") > > # cosine function > cosine = function (x, y){ > if (is.vector(x) && is.vector(y)) { > return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))) > } else {stop("cosine: argument mismatch. Two vectors needed as input.")} > } > > # The loop-based solution I have is: > if (is(x, "Matrix") ) { > cos = array(NA, c(ncol(x), ncol(x))) > for (i in 2:ncol(x)) { > for (j in 1:(i - 1)) { > cos[i, j] = cosine(x[, i], x[, j]) > } > } > } > > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > > Also, I have noticed that getting a row from a Matrix object produces a normal > array (i.e., it does not inherit Matrix class). However, selecting >1 rows, does > produce a same-class matrix. If I convert with as() the output of selecting one > row, am I losing performance? Is there any way to make the resulting vector be a > 1-D Matrix object? > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > -- > Thanks in advance, > -Jose > > -- > Jose Quesada, PhD > Research fellow, Psychology Dept. > Sussex University, Brighton, UK > http://www.andrew.cmu.edu/~jquesada > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
Martin Maechler
2007-Jan-24 10:57 UTC
[Rd] Matrix subsetting {was "... vectorized nested loop..."}
Hi Jose, I'm answering your second batch of questions, since Chuck Berry has already well done so with the first one>>>>> "Jose" == Jose Quesada <quesada at gmail.com> >>>>> on Tue, 23 Jan 2007 21:46:27 +0100 writes:[........] Jose> # example Jose> library(Matrix) Jose> x = as(x,"CsparseMatrix") [..........] Jose> Also, I have noticed that getting a row from a Matrix Jose> object produces a normal array (i.e., it does not Jose> inherit Matrix class). This is very much on purpose, following the principle of "least surprise" so I'm surprised you're suprised.. : The 'Matrix' behavior has been modelled to follow the more than 20 years old 'matrix' behavior : > matrix(1:9, 3) [,2] [1] 4 5 6 > matrix(1:9, 3) [,2 , drop=FALSE] [,1] [1,] 4 [2,] 5 [3,] 6 > library(Matrix) Loading required package: lattice > Matrix(1:9, 3) [,2] [1] 4 5 6 > Matrix(1:9, 3) [,2, drop = FALSE] 3 x 1 Matrix of class "dgeMatrix" [,1] [1,] 4 [2,] 5 [3,] 6 > But then I should not be surprised, because there has been the R FAQ>> 7.5 Why do my matrices lose dimensions?for quite a while. *And* I think that there is only one "thing" in the S language about which every "knowledgable one" agrees that it's a "design bug", and that's the fact that 'drop = TRUE' is the default, and not 'drop = FALSE' {but it's not possible to change now, please don't start that discussion!} Given what I say above, I wonder if our ("new-style") 'Matrix' objects should not behave differently than ("old-style") 'matrix' and indeed do use a default 'drop = FALSE'. This might break some Matrix-based code though, but then 'Matrix' is young enough, and working Matrix indexing is much younger, and there are only about 4 CRAN/Bioconductor packages depending on 'Matrix'. --> This discussion (about changing this behavior in the "Matrix" package) should definitely be lead on the R-devel mailing list --> CC'ing to R-devel {hence one (but please *only* one !) cross-post} Jose> However, selecting >1 rows, Jose> does produce a same-class matrix. If I convert with Jose> as() the output of selecting one row, am I losing Jose> performance? Is there any way to make the resulting Jose> vector be a 1-D Matrix object? yes, ", drop = FALSE", see above Martin
Jose Quesada
2007-Jan-24 13:53 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
Thanks Charles, Martin,
Substantial improvement with the vectorized solution. Here is a quick benchmark:
# The loop-based solution:
nestedCos = function (x) {
if (is(x, "Matrix") ) {
cos = array(NA, c(ncol(x), ncol(x)))
for (i in 2:ncol(x)) {
for (j in 1:(i - 1)) {
cos[i, j] = cosine(x[, i], x[, j])
}
}
}
return(cos)
}
# Charles C. Berry's vectorized approach
flatCos = function (x) {
res = crossprod( x , x )
diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res )))
cos = diagnl %*% res %*% diagnl
return(cos)
}
Benchmarking:
> system.time(for(i in 1:10)nestedCos(x))
(I stopped because it was taking too long)
Timing stopped at: 139.37 3.82 188.76 NA NA> system.time(for(i in 1:10)flatCos(x))
[1] 0.43 0.00 0.48 NA NA
#------------------------------------------------------
As much as I like to have faster code, I'm still wondering WHY flatCos gets
the same results; i.e., why multiplying the inverse sqrt root of the diagonal of
x BY x, then BY the diagonal again produces the expected result. I checked the
wikipedia page for crossprod and other sources, but it still eludes me. I can
see that scaling by the sqrt of the diagonal once makes sense with 'res
<- crossprod( x , x ) gives your result up to scale factors of
sqrt(res[i,i]*res[j,j])', but I still don't see why you need to
postmultiply by the diagonal again.
Maybe trying to attack a simpler problem might help my understanding: e.g.,
calculating the cos of a column to all other colums of x (that is, the inner
part of the nested loop). How would that work in a vectorized way? I'm
trying to get some general technique that I can reuse later from this excellent
answer.
Thanks,
-Jose
>
>
> I am rusty on 'Matrix', but I see there are crossprod methods for
those
> classes.
>
> res <- crossprod( x , x )
>
> gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so
> something like
>
> diagnl <- Diagonal( ncol(x), sqrt( diag( res ) )
>
OOPS! Better make that
diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )
>
> final.res <- diagnl %*% res %*% diagnl
>
> should do it.
>
--
Cheers,
-Jose
--
Jose Quesada, PhD
Research fellow, Psychology Dept.
Sussex University, Brighton, UK
http://www.andrew.cmu.edu/~jquesada