Consider a matrix like
> ma = matrix(10:15, nr = 3)
> ma
[,1] [,2]
[1,] 10 13
[2,] 11 14
[3,] 12 15
I want to rearrange each column according to row indexes (1 to 3)
given in another matrix, as in
> idx = matrix(c(1,3,2, 2,3,1), nr = 3)
> idx
[,1] [,2]
[1,] 1 2
[2,] 3 3
[3,] 2 1
The new matrix mb will have for each column the corresponding column
of ma indexed by the corresponding column of idx, as in
> mb = ma
> for (j in 1:2) mb[,j] = ma[idx[,j], j]
> mb
[,1] [,2]
[1,] 10 14
[2,] 12 15
[3,] 11 13
Can I avoid the for() loop? I'm specially interested to find out if a
fast implementation using lapply() would be feasible for large input
matrices (analogues of ma and idx) transformed into data frames.
Roberto Osorio
Bill.Venables at csiro.au
2007-Jan-19 08:00 UTC
[R] Vectorize rearrangement within each column
As with most things like this, you can trade memory for speed. Here is an obfuscated solution that appears to eschew loops entirely.> ma <- matrix(10:15, nr = 3) > idx <- matrix(c(1,3,2, 2,3,1), nr = 3) > mb <- ma > mb[] <- as.vector(ma)[as.vector(idx +outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))]> mb[,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Ordinarily, though, my preferred solution would be the for() loop. Bill Venables CMIS, CSIRO Laboratories, PO Box 120, Cleveland, Qld. 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile (rarely used): +61 4 1963 4642 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Osorio Roberto Sent: Friday, 19 January 2007 4:15 PM To: r-help at stat.math.ethz.ch Subject: [R] Vectorize rearrangement within each column Consider a matrix like > ma = matrix(10:15, nr = 3) > ma [,1] [,2] [1,] 10 13 [2,] 11 14 [3,] 12 15 I want to rearrange each column according to row indexes (1 to 3) given in another matrix, as in > idx = matrix(c(1,3,2, 2,3,1), nr = 3) > idx [,1] [,2] [1,] 1 2 [2,] 3 3 [3,] 2 1 The new matrix mb will have for each column the corresponding column of ma indexed by the corresponding column of idx, as in > mb = ma > for (j in 1:2) mb[,j] = ma[idx[,j], j] > mb [,1] [,2] [1,] 10 14 [2,] 12 15 [3,] 11 13 Can I avoid the for() loop? I'm specially interested to find out if a fast implementation using lapply() would be feasible for large input matrices (analogues of ma and idx) transformed into data frames. Roberto Osorio ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Matrix subscripting can be used for this:
> mb <- ma[cbind(as.vector(idx), as.vector(col(idx)))]
> dim(mb) <- dim(ma)
> mb
[,1] [,2]
[1,] 10 14
[2,] 12 15
[3,] 11 13
Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Osorio Roberto wrote:
>Consider a matrix like
>
> > ma = matrix(10:15, nr = 3)
> > ma
> [,1] [,2]
>[1,] 10 13
>[2,] 11 14
>[3,] 12 15
>
>I want to rearrange each column according to row indexes (1 to 3)
>given in another matrix, as in
>
> > idx = matrix(c(1,3,2, 2,3,1), nr = 3)
> > idx
> [,1] [,2]
>[1,] 1 2
>[2,] 3 3
>[3,] 2 1
>
>The new matrix mb will have for each column the corresponding column
>of ma indexed by the corresponding column of idx, as in
>
> > mb = ma
> > for (j in 1:2) mb[,j] = ma[idx[,j], j]
> > mb
> [,1] [,2]
>[1,] 10 14
>[2,] 12 15
>[3,] 11 13
>
>Can I avoid the for() loop? I'm specially interested to find out if a
>fast implementation using lapply() would be feasible for large input
>matrices (analogues of ma and idx) transformed into data frames.
>
>Roberto Osorio
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
Turn each matrix into a data.frame and then use mapply with the "["
function,
converting back to matrix when done:
as.matrix(mapply("[", as.data.frame(ma), as.data.frame(idx)))
V1 V2
[1,] 10 14
[2,] 12 15
[3,] 11 13
On 1/19/07, Osorio Roberto <roboso at gmail.com>
wrote:> Consider a matrix like
>
> > ma = matrix(10:15, nr = 3)
> > ma
> [,1] [,2]
> [1,] 10 13
> [2,] 11 14
> [3,] 12 15
>
> I want to rearrange each column according to row indexes (1 to 3)
> given in another matrix, as in
>
> > idx = matrix(c(1,3,2, 2,3,1), nr = 3)
> > idx
> [,1] [,2]
> [1,] 1 2
> [2,] 3 3
> [3,] 2 1
>
> The new matrix mb will have for each column the corresponding column
> of ma indexed by the corresponding column of idx, as in
>
> > mb = ma
> > for (j in 1:2) mb[,j] = ma[idx[,j], j]
> > mb
> [,1] [,2]
> [1,] 10 14
> [2,] 12 15
> [3,] 11 13
>
> Can I avoid the for() loop? I'm specially interested to find out if a
> fast implementation using lapply() would be feasible for large input
> matrices (analogues of ma and idx) transformed into data frames.
>
> Roberto Osorio
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thanks for the solutions. Here are some time tests for ma and idx
being 100 X 100,000. The machine is a 2.16 GHz Intel MacBook Pro with
2 GB memory.
ma <- matrix(rnorm(1e7), nr = 100) # 100 X 100,000
idx <- matrix(round( runif(1e7, 1, 100) ), nr = 100)
# Original:
system.time( {
mb <- ma;
for (j in 1:1e5) mb[,j] <- ma[idx[j],j]
} )
[1] 1.354 0.087 1.435 0.000 0.000
# Prof. Venables' version:
system.time( mb[] <- as.vector(ma)[as.vector(idx +
outer(rep(nrow(ma), nrow(ma)), 1:ncol(ma)-1, '*'))] )
[1] 0.885 0.857 2.262 0.000 0.000
# Patrick Burns' version:
system.time( {
mb <- ma[cbind(as.vector(idx), as.vector(col(idx)))];
dim(mb) <- dim(ma)
} )
[1] 1.672 0.615 2.277 0.000 0.000
# Gabor Grothendieck's version led to some memory handling issue. I
stepped one order of magnitude down in the number of columns but it's
still very slow.
> ma <- matrix(rnorm(1e6), nr = 100) # 100 X 10,000
> idx = matrix(round( runif(1e6, 1, 100) ), nr = 100)
> system.time( as.matrix(mapply("[", as.data.frame(ma),
as.data.frame(idx))) )
[1] 2.060 0.133 2.768 0.000 0.000
So, Prof. Venables' solution is the fastest. In view of only moderate
time savings, I will take his advice and keep the original loop for
code clarity.
Roberto Osorio
------
Possibly Parallel Threads
- R Cocoa GUI 1.12 (R 2.1.1 Framework) crashes on acf() (PR#8032)
- Automatic paren/bracket closing in 2.5.0?
- Help on reshape2 data frame rearrangement
- transforming a .csv file column names as per a particular column rows using R code
- loop vs. apply(): strange behavior with data frame?