Hi, I have a dataframe (say myData) and want to get a list (say myList) that contains a matrix for each row of the dataframe myData. These matrices are calculated based on the corresponding row of myData. Using a for()-loop to do this is very slow. Thus, I tried to use apply(). However, afaik apply() does only return a list if the matrices have different dimensions, while my matrices have all the same dimension. To get a list I could change the dimension of one matrix artificially and restore it after apply(): This a (very much) simplified example of what I did:> myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) > myFunction <- function( values ) {+ myMatrix <- matrix( values, 2, 2 ) + if( all( values == myData[ 1, ] ) ) { + myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) + } + return( myMatrix ) + }> myList <- apply( myData, 1, myFunction ) > myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] > myList$"1" [,1] [,2] [1,] 1 1 [2,] 4 4 $"2" [,1] [,2] [1,] 2 2 [2,] 5 5 $"3" [,1] [,2] [1,] 3 3 [2,] 6 6 This exactly does what I want and really speeds up the calculation, but I wonder if there is an easier way to make apply() return a list. Thanks for your help, Arne -- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 ahenningsen at agric-econ.uni-kiel.de http://www.uni-kiel.de/agrarpol/ahenningsen/
for()-loops aren't so bad. Look inside the code of apply() and see what it uses! The important thing is that you use vectorized functions to manipulate vectors. It's often fine to use for-loops to manipulate the rows or columns of a matrix, but once you've extracted a row or a column, then use a vectorized function to manipulate that data. In any case, one way to get apply() to return a list is to wrap the result from the subfunction inside a list, e.g.: > x <- apply(matrix(1:6,2), 1, function(x) list((c(mean=mean(x), sd=sd(x))))) > x [[1]] [[1]][[1]] mean sd 3 2 [[2]] [[2]][[1]] mean sd 4 2 > # to remove the extra level of listing here, do: > lapply(x, "[[", 1) [[1]] mean sd 3 2 [[2]] mean sd 4 2 > At Monday 11:37 AM 11/1/2004, Arne Henningsen wrote:>Hi, > >I have a dataframe (say myData) and want to get a list (say myList) that >contains a matrix for each row of the dataframe myData. These matrices are >calculated based on the corresponding row of myData. Using a for()-loop to do >this is very slow. Thus, I tried to use apply(). However, afaik apply() does >only return a list if the matrices have different dimensions, while my >matrices have all the same dimension. To get a list I could change the >dimension of one matrix artificially and restore it after apply(): > >This a (very much) simplified example of what I did: > > myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) > > myFunction <- function( values ) { >+ myMatrix <- matrix( values, 2, 2 ) >+ if( all( values == myData[ 1, ] ) ) { >+ myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) >+ } >+ return( myMatrix ) >+ } > > myList <- apply( myData, 1, myFunction ) > > myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] > > myList >$"1" > [,1] [,2] >[1,] 1 1 >[2,] 4 4 > >$"2" > [,1] [,2] >[1,] 2 2 >[2,] 5 5 > >$"3" > [,1] [,2] >[1,] 3 3 >[2,] 6 6 > >This exactly does what I want and really speeds up the calculation, but I >wonder if there is an easier way to make apply() return a list. > >Thanks for your help, >Arne > >-- >Arne Henningsen >Department of Agricultural Economics >University of Kiel >Olshausenstr. 40 >D-24098 Kiel (Germany) >Tel: +49-431-880 4445 >Fax: +49-431-880 1397 >ahenningsen at agric-econ.uni-kiel.de >http://www.uni-kiel.de/agrarpol/ahenningsen/ > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
How about this... x = matrix(1:27, ncol=9, byrow=T) nr= nrow(x) lapply(1:nr, function(i) matrix(x[i,], nrow=3, byrow=T)) Mahbub. On Mon, 1 Nov 2004 19:37:08 +0100, Arne Henningsen <ahenningsen at email.uni-kiel.de> wrote:> Hi, > > I have a dataframe (say myData) and want to get a list (say myList) that > contains a matrix for each row of the dataframe myData. These matrices are > calculated based on the corresponding row of myData. Using a for()-loop to do > this is very slow. Thus, I tried to use apply(). However, afaik apply() does > only return a list if the matrices have different dimensions, while my > matrices have all the same dimension. To get a list I could change the > dimension of one matrix artificially and restore it after apply(): > > This a (very much) simplified example of what I did: > > myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) > > myFunction <- function( values ) { > + myMatrix <- matrix( values, 2, 2 ) > + if( all( values == myData[ 1, ] ) ) { > + myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) > + } > + return( myMatrix ) > + } > > myList <- apply( myData, 1, myFunction ) > > myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] > > myList > $"1" > [,1] [,2] > [1,] 1 1 > [2,] 4 4 > > $"2" > [,1] [,2] > [1,] 2 2 > [2,] 5 5 > > $"3" > [,1] [,2] > [1,] 3 3 > [2,] 6 6 > > This exactly does what I want and really speeds up the calculation, but I > wonder if there is an easier way to make apply() return a list. > > Thanks for your help, > Arne > > -- > Arne Henningsen > Department of Agricultural Economics > University of Kiel > Olshausenstr. 40 > D-24098 Kiel (Germany) > Tel: +49-431-880 4445 > Fax: +49-431-880 1397 > ahenningsen at agric-econ.uni-kiel.de > http://www.uni-kiel.de/agrarpol/ahenningsen/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- A.H.M. Mahbub-ul Latif PhD Student Department of Medical Statistics University of Goettingen Germany
Arne Henningsen <ahenningsen <at> email.uni-kiel.de> writes: : : Hi, : : I have a dataframe (say myData) and want to get a list (say myList) that : contains a matrix for each row of the dataframe myData. These matrices are : calculated based on the corresponding row of myData. Using a for()-loop to do : this is very slow. Thus, I tried to use apply(). However, afaik apply() does : only return a list if the matrices have different dimensions, while my : matrices have all the same dimension. To get a list I could change the : dimension of one matrix artificially and restore it after apply(): : : This a (very much) simplified example of what I did: : > myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) : > myFunction <- function( values ) { : + myMatrix <- matrix( values, 2, 2 ) : + if( all( values == myData[ 1, ] ) ) { : + myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) : + } : + return( myMatrix ) : + } : > myList <- apply( myData, 1, myFunction ) : > myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] Try lapplying over the columns of the transpose: lapply(as.data.frame(t(myData)), matrix, 2, 2)
Hi, thank you very much Sundar, Patrick, Tony, Mahub and Gabor for your helpful answers! All your examples work great. They are all more straightforeward than my example and much faster than the for-loop. These are the average elapsed times (in seconds) returned by system.time()[3] (applied to my real function and my real data): my original for-loop: 5.55 the example I presented in my previous email (using apply): 2.35 example suggested by Tony (using apply): 2.34 example suggested by Gabor (using lapply): 2.50 examples suggested by Sundar and Mahub (using lapply): 2.68 Best regards, Arne On Monday 01 November 2004 19:52, Sundar Dorai-Raj wrote:> Arne Henningsen wrote: > > Hi, > > > > I have a dataframe (say myData) and want to get a list (say myList) that > > contains a matrix for each row of the dataframe myData. These matrices > > are calculated based on the corresponding row of myData. Using a > > for()-loop to do this is very slow. Thus, I tried to use apply(). > > However, afaik apply() does only return a list if the matrices have > > different dimensions, while my matrices have all the same dimension. To > > get a list I could change the dimension of one matrix artificially and > > restore it after apply(): > > > > This a (very much) simplified example of what I did: > >>myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) > >>myFunction <- function( values ) { > > > > + myMatrix <- matrix( values, 2, 2 ) > > + if( all( values == myData[ 1, ] ) ) { > > + myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) > > + } > > + return( myMatrix ) > > + } > > > >>myList <- apply( myData, 1, myFunction ) > >>myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] > >>myList > > > > $"1" > > [,1] [,2] > > [1,] 1 1 > > [2,] 4 4 > > > > $"2" > > [,1] [,2] > > [1,] 2 2 > > [2,] 5 5 > > > > $"3" > > [,1] [,2] > > [1,] 3 3 > > [2,] 6 6 > > > > This exactly does what I want and really speeds up the calculation, but I > > wonder if there is an easier way to make apply() return a list. > > > > Thanks for your help, > > Arne > > Hi Arne, > > I'm not sure how much faster this will be over using `for' but you can try: > > lapply(seq(nrow(myData)), function(i) myFunction(myData[i, ])) > > --sundar-- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 ahenningsen at agric-econ.uni-kiel.de http://www.uni-kiel.de/agrarpol/ahenningsen/
Arne Henningsen <ahenningsen <at> email.uni-kiel.de> writes: : : Hi, : : thank you very much Sundar, Patrick, Tony, Mahub and Gabor for your helpful : answers! All your examples work great. They are all more straightforeward : than my example and much faster than the for-loop. : These are the average elapsed times (in seconds) returned by system.time() [3] : (applied to my real function and my real data): : : my original for-loop: : 5.55 : : the example I presented in my previous email (using apply): : 2.35 : : example suggested by Tony (using apply): : 2.34 : : example suggested by Gabor (using lapply): : 2.50 : : examples suggested by Sundar and Mahub (using lapply): : 2.68 : Perhaps any comparison should also include simplicity. This is somewhat subjective but just to objectify it I have reworked each solution to compactify it as much as I could and then calculated the number of characters in each solution using wc: AH - 293 characters TP - 70 characters ML - 62 characters GG - 48 characters The versions I used are below. --- # data myData <- data.frame( a = c( 1,2,3 ), b = c( 4,5,6 ) ) # AH myFunction <- function( values ) { myMatrix <- matrix( values, 2, 2 ) if( all( values == myData[ 1, ] ) ) { myMatrix <- cbind( myMatrix, rep( 0, 2 ) ) } return( myMatrix ) } myList <- apply( myData, 1, myFunction ) myList[[ 1 ]] <- myList[[ 1 ]][ 1:2, 1:2 ] myList # TP lapply(apply(myData, 1, function(x) list(matrix(x, 2, 2))), "[[", 1) # ML lapply(1:nrow(myData), function(i) matrix(myData[i,], 2, 2)) # GG lapply(as.data.frame(t(myData)), matrix, 2, 2)