joke R
2011-Jan-10 14:21 UTC
[R] replace values in array with means of certain values in array
Een ingesloten tekst met niet-gespecificeerde tekenset is gescrubt ... Naam: niet beschikbaar URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110110/596768d9/attachment.pl>
Petr Savicky
2011-Jan-10 17:07 UTC
[R] replace values in array with means of certain values in array
On Mon, Jan 10, 2011 at 03:21:18PM +0100, joke R wrote:> Dear all, > > I have a problem with arrays. > Simplified, I have two arrays: > > A > [,,1] > 1 2 3 > 4 5 6 > 7 8 9 > > [,,2] > 10 11 12 > 13 14 15 > 16 17 18 > > B> 1 1 2 > 1 1 2 > 3 3 2 > > Basically, B declares to which cluster the values of A belong to. Now I > want an array C where for each cluster of B, the mean of A is shown (the > mean on the first 2 dimensions): > > C> [,,1] > 3 3 6 > 3 3 6 > 7.5 7.5 6 > [,,2] > 12 12 15 > 12 12 15 > 16.5 16.5 15 > > The following short solution works: > > A <- > aperm(array(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),dim=c(3,3,2)),c(2,1,3)) > B <- matrix(c(1,1,2,1,1,2,3,3,2),nrow=3,byrow=TRUE) > C <- array(NA,dim=c(3,3,2)) > > for(a in 1:3){ > TS <- matrix(A[B==a],nrow=2,byrow=TRUE) > C[B==a] <- rep(rowMeans(TS),each=sum(B==a)) > } > > But unfortunately, my dataset is a 64x64x64x120 array, and I have > approximately 29000 clusters, and I'd have to repeat the process on 500 > simulated datasets. > If I try the code above, it takes about 2 hours... > > Strange enough, it is the first line in the loop that takes so long: > reconstructing a dataset with only the datapoints belonging to a certain > cluster in it...Let me suggest to consider the following approach, which avoids the first line of the above code, computes the means in a different way and puts them back using the second line of the above code. First, transform A into a matrix, where the first two dimensions represent the rows and the remaining dimensions represent the columns. This does not reorder the elements of the array, only changes the dimension information. A1 <- matrix(A, nrow=3*3, ncol=2) Compute the means B1 <- c(B) C1 <- rowsum(A1, group=B1)/c(table(B1)) Put the means to the required positions for(a in 1:3){ C[B==a] <- rep(C1[a, ], each=sum(B==a)) } I hope this can help. Petr Savicky.