Jose Quesada
2007-Jan-23 20:46 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
(Extremely sorry, disregard previous email as I hit send before pasting the latest version of the example; this one is smaller too) Dear R users, I want to apply a function that takes two vectors as input to all pairs (combinations (nrow(X), 2))of matrix rows in a matrix. I know that ideally, one should avoid loops in R, but after reading the docs for do.call, apply, etc, I still don't know how to write the nested loop in a vectorized way. Example data: x = matrix(rnorm(100), 10, 10) # this is actually a very large sparse matrix, but it doesn't matter for the # example library(Matrix) x = as(x,"CsparseMatrix") # cosine function cosine = function (x, y){ if (is.vector(x) && is.vector(y)) { return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))) } else {stop("cosine: argument mismatch. Two vectors needed as input.")} } # The loop-based solution I have is: if (is(x, "Matrix") ) { cos = array(NA, c(ncol(x), ncol(x))) for (i in 2:ncol(x)) { for (j in 1:(i - 1)) { cos[i, j] = cosine(x[, i], x[, j]) } } } This solution seems inneficient. Is there an easy way of achieving this with a clever do.call + apply combination? Also, I have noticed that getting a row from a Matrix object produces a normal array (i.e., it does not inherit Matrix class). However, selecting >1 rows, does produce a same-class matrix. If I convert with as() the output of selecting one row, am I losing performance? Is there any way to make the resulting vector be a 1-D Matrix object? This solution seems inneficient. Is there an easy way of achieving this with a clever do.call + apply combination? -- Thanks in advance, -Jose -- Jose Quesada, PhD Research fellow, Psychology Dept. Sussex University, Brighton, UK http://www.andrew.cmu.edu/~jquesada
Charles C. Berry
2007-Jan-23 23:20 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
I am rusty on 'Matrix', but I see there are crossprod methods for those classes. res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so something like diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) final.res <- diagnl %*% res %*% diagnl should do it. On Tue, 23 Jan 2007, Jose Quesada wrote:> (Extremely sorry, disregard previous email as I hit send before pasting the latest version of the example; this one is smaller too) > Dear R users, > > I want to apply a function that takes two vectors as input to all pairs > (combinations (nrow(X), 2))of matrix rows in a matrix. > I know that ideally, one should avoid loops in R, but after reading the docs for > do.call, apply, etc, I still don't know how to write the nested loop in a > vectorized way. > > Example data: > x = matrix(rnorm(100), 10, 10) > # this is actually a very large sparse matrix, but it doesn't matter for the > # example > library(Matrix) > x = as(x,"CsparseMatrix") > > # cosine function > cosine = function (x, y){ > if (is.vector(x) && is.vector(y)) { > return(crossprod(x, y)/sqrt(crossprod(x) * crossprod(y))) > } else {stop("cosine: argument mismatch. Two vectors needed as input.")} > } > > # The loop-based solution I have is: > if (is(x, "Matrix") ) { > cos = array(NA, c(ncol(x), ncol(x))) > for (i in 2:ncol(x)) { > for (j in 1:(i - 1)) { > cos[i, j] = cosine(x[, i], x[, j]) > } > } > } > > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > > Also, I have noticed that getting a row from a Matrix object produces a normal > array (i.e., it does not inherit Matrix class). However, selecting >1 rows, does > produce a same-class matrix. If I convert with as() the output of selecting one > row, am I losing performance? Is there any way to make the resulting vector be a > 1-D Matrix object? > This solution seems inneficient. Is there an easy way of achieving this with a > clever do.call + apply combination? > -- > Thanks in advance, > -Jose > > -- > Jose Quesada, PhD > Research fellow, Psychology Dept. > Sussex University, Brighton, UK > http://www.andrew.cmu.edu/~jquesada > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
Martin Maechler
2007-Jan-24 10:57 UTC
[Rd] Matrix subsetting {was "... vectorized nested loop..."}
Hi Jose, I'm answering your second batch of questions, since Chuck Berry has already well done so with the first one>>>>> "Jose" == Jose Quesada <quesada at gmail.com> >>>>> on Tue, 23 Jan 2007 21:46:27 +0100 writes:[........] Jose> # example Jose> library(Matrix) Jose> x = as(x,"CsparseMatrix") [..........] Jose> Also, I have noticed that getting a row from a Matrix Jose> object produces a normal array (i.e., it does not Jose> inherit Matrix class). This is very much on purpose, following the principle of "least surprise" so I'm surprised you're suprised.. : The 'Matrix' behavior has been modelled to follow the more than 20 years old 'matrix' behavior : > matrix(1:9, 3) [,2] [1] 4 5 6 > matrix(1:9, 3) [,2 , drop=FALSE] [,1] [1,] 4 [2,] 5 [3,] 6 > library(Matrix) Loading required package: lattice > Matrix(1:9, 3) [,2] [1] 4 5 6 > Matrix(1:9, 3) [,2, drop = FALSE] 3 x 1 Matrix of class "dgeMatrix" [,1] [1,] 4 [2,] 5 [3,] 6 > But then I should not be surprised, because there has been the R FAQ>> 7.5 Why do my matrices lose dimensions?for quite a while. *And* I think that there is only one "thing" in the S language about which every "knowledgable one" agrees that it's a "design bug", and that's the fact that 'drop = TRUE' is the default, and not 'drop = FALSE' {but it's not possible to change now, please don't start that discussion!} Given what I say above, I wonder if our ("new-style") 'Matrix' objects should not behave differently than ("old-style") 'matrix' and indeed do use a default 'drop = FALSE'. This might break some Matrix-based code though, but then 'Matrix' is young enough, and working Matrix indexing is much younger, and there are only about 4 CRAN/Bioconductor packages depending on 'Matrix'. --> This discussion (about changing this behavior in the "Matrix" package) should definitely be lead on the R-devel mailing list --> CC'ing to R-devel {hence one (but please *only* one !) cross-post} Jose> However, selecting >1 rows, Jose> does produce a same-class matrix. If I convert with Jose> as() the output of selecting one row, am I losing Jose> performance? Is there any way to make the resulting Jose> vector be a 1-D Matrix object? yes, ", drop = FALSE", see above Martin
Jose Quesada
2007-Jan-24 13:53 UTC
[R] [fixed] vectorized nested loop: apply a function that takes two rows
Thanks Charles, Martin, Substantial improvement with the vectorized solution. Here is a quick benchmark: # The loop-based solution: nestedCos = function (x) { if (is(x, "Matrix") ) { cos = array(NA, c(ncol(x), ncol(x))) for (i in 2:ncol(x)) { for (j in 1:(i - 1)) { cos[i, j] = cosine(x[, i], x[, j]) } } } return(cos) } # Charles C. Berry's vectorized approach flatCos = function (x) { res = crossprod( x , x ) diagnl = Diagonal( ncol(x), 1 / sqrt( diag( res ))) cos = diagnl %*% res %*% diagnl return(cos) } Benchmarking:> system.time(for(i in 1:10)nestedCos(x))(I stopped because it was taking too long) Timing stopped at: 139.37 3.82 188.76 NA NA> system.time(for(i in 1:10)flatCos(x))[1] 0.43 0.00 0.48 NA NA #------------------------------------------------------ As much as I like to have faster code, I'm still wondering WHY flatCos gets the same results; i.e., why multiplying the inverse sqrt root of the diagonal of x BY x, then BY the diagonal again produces the expected result. I checked the wikipedia page for crossprod and other sources, but it still eludes me. I can see that scaling by the sqrt of the diagonal once makes sense with 'res <- crossprod( x , x ) gives your result up to scale factors of sqrt(res[i,i]*res[j,j])', but I still don't see why you need to postmultiply by the diagonal again. Maybe trying to attack a simpler problem might help my understanding: e.g., calculating the cos of a column to all other colums of x (that is, the inner part of the nested loop). How would that work in a vectorized way? I'm trying to get some general technique that I can reuse later from this excellent answer. Thanks, -Jose> > > I am rusty on 'Matrix', but I see there are crossprod methods for those > classes. > > res <- crossprod( x , x ) > > gives your result up to scale factors of sqrt(res[i,i]*res[j,j]), so > something like > > diagnl <- Diagonal( ncol(x), sqrt( diag( res ) ) >OOPS! Better make that diagnl <- Diagonal( ncol(x), 1 / sqrt( diag( res ) )> > final.res <- diagnl %*% res %*% diagnl > > should do it. >-- Cheers, -Jose -- Jose Quesada, PhD Research fellow, Psychology Dept. Sussex University, Brighton, UK http://www.andrew.cmu.edu/~jquesada