Consider a matrix like > ma = matrix(10:15, nr = 3) > ma [,1] [,2] [1,] 10 13 [2,] 11 14 [3,] 12 15 I want to rearrange each column according to row indexes (1 to 3) given in another matrix, as in > idx = matrix(c(1,3,2, 2,3,1), nr = 3) > idx [,1] [,2] [1,] 1 2 [2,] 3 3 [3,] 2 1 The new matrix mb will have for each column the corresponding column of ma indexed by the corresponding column of idx, as in > mb = ma > for (j in 1:2) mb[,j] = ma[idx[,j], j] > mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Can I avoid the for() loop? I'm specially interested to find out if a fast implementation using lapply() would be feasible for large input matrices (analogues of ma and idx) transformed into data frames. Roberto Osorio
Bill.Venables at csiro.au
2007-Jan-19 08:00 UTC
[R] Vectorize rearrangement within each column
As with most things like this, you can trade memory for speed. Here is an obfuscated solution that appears to eschew loops entirely.> ma <- matrix(10:15, nr = 3) > idx <- matrix(c(1,3,2, 2,3,1), nr = 3) > mb <- ma > mb[] <- as.vector(ma)[as.vector(idx +outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))]> mb[,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Ordinarily, though, my preferred solution would be the for() loop. Bill Venables CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile (rarely used): +61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Osorio Roberto Sent: Friday, 19 January 2007 4:15 PM To: r-help at stat.math.ethz.ch Subject: [R] Vectorize rearrangement within each column Consider a matrix like > ma = matrix(10:15, nr = 3) > ma [,1] [,2] [1,] 10 13 [2,] 11 14 [3,] 12 15 I want to rearrange each column according to row indexes (1 to 3) given in another matrix, as in > idx = matrix(c(1,3,2, 2,3,1), nr = 3) > idx [,1] [,2] [1,] 1 2 [2,] 3 3 [3,] 2 1 The new matrix mb will have for each column the corresponding column of ma indexed by the corresponding column of idx, as in > mb = ma > for (j in 1:2) mb[,j] = ma[idx[,j], j] > mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Can I avoid the for() loop? I'm specially interested to find out if a fast implementation using lapply() would be feasible for large input matrices (analogues of ma and idx) transformed into data frames. Roberto Osorio ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Matrix subscripting can be used for this: > mb <- ma[cbind(as.vector(idx), as.vector(col(idx)))] > dim(mb) <- dim(ma) > mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Osorio Roberto wrote:>Consider a matrix like > > > ma = matrix(10:15, nr = 3) > > ma > [,1] [,2] >[1,] 10 13 >[2,] 11 14 >[3,] 12 15 > >I want to rearrange each column according to row indexes (1 to 3) >given in another matrix, as in > > > idx = matrix(c(1,3,2, 2,3,1), nr = 3) > > idx > [,1] [,2] >[1,] 1 2 >[2,] 3 3 >[3,] 2 1 > >The new matrix mb will have for each column the corresponding column >of ma indexed by the corresponding column of idx, as in > > > mb = ma > > for (j in 1:2) mb[,j] = ma[idx[,j], j] > > mb > [,1] [,2] >[1,] 10 14 >[2,] 12 15 >[3,] 11 13 > >Can I avoid the for() loop? I'm specially interested to find out if a >fast implementation using lapply() would be feasible for large input >matrices (analogues of ma and idx) transformed into data frames. > >Roberto Osorio > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > > > >
Turn each matrix into a data.frame and then use mapply with the "[" function, converting back to matrix when done: as.matrix(mapply("[", as.data.frame(ma), as.data.frame(idx))) V1 V2 [1,] 10 14 [2,] 12 15 [3,] 11 13 On 1/19/07, Osorio Roberto <roboso at gmail.com> wrote:> Consider a matrix like > > > ma = matrix(10:15, nr = 3) > > ma > [,1] [,2] > [1,] 10 13 > [2,] 11 14 > [3,] 12 15 > > I want to rearrange each column according to row indexes (1 to 3) > given in another matrix, as in > > > idx = matrix(c(1,3,2, 2,3,1), nr = 3) > > idx > [,1] [,2] > [1,] 1 2 > [2,] 3 3 > [3,] 2 1 > > The new matrix mb will have for each column the corresponding column > of ma indexed by the corresponding column of idx, as in > > > mb = ma > > for (j in 1:2) mb[,j] = ma[idx[,j], j] > > mb > [,1] [,2] > [1,] 10 14 > [2,] 12 15 > [3,] 11 13 > > Can I avoid the for() loop? I'm specially interested to find out if a > fast implementation using lapply() would be feasible for large input > matrices (analogues of ma and idx) transformed into data frames. > > Roberto Osorio > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thanks for the solutions. Here are some time tests for ma and idx being 100 X 100,000. The machine is a 2.16 GHz Intel MacBook Pro with 2 GB memory. ma <- matrix(rnorm(1e7), nr = 100) # 100 X 100,000 idx <- matrix(round( runif(1e7, 1, 100) ), nr = 100) # Original: system.time( { mb <- ma; for (j in 1:1e5) mb[,j] <- ma[idx[j],j] } ) [1] 1.354 0.087 1.435 0.000 0.000 # Prof. Venables' version: system.time( mb[] <- as.vector(ma)[as.vector(idx + outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))] ) [1] 0.885 0.857 2.262 0.000 0.000 # Patrick Burns' version: system.time( { mb <- ma[cbind(as.vector(idx), as.vector(col(idx)))]; dim(mb) <- dim(ma) } ) [1] 1.672 0.615 2.277 0.000 0.000 # Gabor Grothendieck's version led to some memory handling issue. I stepped one order of magnitude down in the number of columns but it's still very slow.> ma <- matrix(rnorm(1e6), nr = 100) # 100 X 10,000 > idx = matrix(round( runif(1e6, 1, 100) ), nr = 100) > system.time( as.matrix(mapply("[", as.data.frame(ma), as.data.frame(idx))) )[1] 2.060 0.133 2.768 0.000 0.000 So, Prof. Venables' solution is the fastest. In view of only moderate time savings, I will take his advice and keep the original loop for code clarity. Roberto Osorio ------
Possibly Parallel Threads
- R Cocoa GUI 1.12 (R 2.1.1 Framework) crashes on acf() (PR#8032)
- Automatic paren/bracket closing in 2.5.0?
- Help on reshape2 data frame rearrangement
- transforming a .csv file column names as per a particular column rows using R code
- loop vs. apply(): strange behavior with data frame?