Dear R experts, Suppose I have a data frame of three variables:> foo <- data.frame(row=1:5, col=1:3, val=rnorm(15)) > foorow col val 1 1 1 -1.00631642 2 2 2 0.77715344 3 3 3 0.17358793 4 4 1 -1.67226988 5 5 2 1.08218836 6 1 3 1.32961329 7 2 1 -0.51186267 8 3 2 -1.20990127 9 4 3 -0.57786899 10 5 1 0.67102887 11 1 2 0.05646411 12 2 3 0.01146612 13 3 1 -3.12094409 14 4 2 -1.01932191 15 5 3 0.76736702 I want to turn this into a matrix of val according to row and col. Let's also assume that some combinations of row and col are missing - i.e. there will be NAs in the resulting Matrix. My current approach is simple and works but is slow for large datasets: mat <- matrix(nrow=max(foo$row), ncol=max(foo$col)) for (line in 1:dim(foo)[1]) { mat[foo[line, 'row'], foo[line, 'col']] <- foo[line, 'val'] }> mat[,1] [,2] [,3] [1,] -1.0063164 0.05646411 1.32961329 [2,] -0.5118627 0.77715344 0.01146612 [3,] -3.1209441 -1.20990127 0.17358793 [4,] -1.6722699 -1.01932191 -0.57786899 [5,] 0.6710289 1.08218836 0.76736702 Can anyone think of a more efficient way? cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel
#reshape package should do it library(reshape) foo <- data.frame(row=1:5, col=1:3, val=rnorm(15)) cast(foo, row~col) On Wed, Nov 5, 2008 at 5:47 PM, Philipp Pagel <p.pagel at wzw.tum.de> wrote:> > Dear R experts, > > Suppose I have a data frame of three variables: > >> foo <- data.frame(row=1:5, col=1:3, val=rnorm(15)) >> foo > row col val > 1 1 1 -1.00631642 > 2 2 2 0.77715344 > 3 3 3 0.17358793 > 4 4 1 -1.67226988 > 5 5 2 1.08218836 > 6 1 3 1.32961329 > 7 2 1 -0.51186267 > 8 3 2 -1.20990127 > 9 4 3 -0.57786899 > 10 5 1 0.67102887 > 11 1 2 0.05646411 > 12 2 3 0.01146612 > 13 3 1 -3.12094409 > 14 4 2 -1.01932191 > 15 5 3 0.76736702 > > > I want to turn this into a matrix of val according to row and col. Let's also > assume that some combinations of row and col are missing - i.e. there will be > NAs in the resulting Matrix. My current approach is simple and works but is > slow for large datasets: > > mat <- matrix(nrow=max(foo$row), ncol=max(foo$col)) > for (line in 1:dim(foo)[1]) { > mat[foo[line, 'row'], foo[line, 'col']] <- foo[line, 'val'] > } > >> mat > [,1] [,2] [,3] > [1,] -1.0063164 0.05646411 1.32961329 > [2,] -0.5118627 0.77715344 0.01146612 > [3,] -3.1209441 -1.20990127 0.17358793 > [4,] -1.6722699 -1.01932191 -0.57786899 > [5,] 0.6710289 1.08218836 0.76736702 > > > Can anyone think of a more efficient way? > > cu > Philipp > > -- > Dr. Philipp Pagel > Lehrstuhl f?r Genomorientierte Bioinformatik > Technische Universit?t M?nchen > Wissenschaftszentrum Weihenstephan > 85350 Freising, Germany > http://mips.gsf.de/staff/pagel > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Stephen Sefick Research Scientist Southeastern Natural Sciences Academy Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis
Philipp Pagel wrote:> Dear R experts, > > Suppose I have a data frame of three variables: > >> foo <- data.frame(row=1:5, col=1:3, val=rnorm(15)) >> foo > row col val > 1 1 1 -1.00631642 > 2 2 2 0.77715344 > 3 3 3 0.17358793 > 4 4 1 -1.67226988 > 5 5 2 1.08218836 > 6 1 3 1.32961329 > 7 2 1 -0.51186267 > 8 3 2 -1.20990127 > 9 4 3 -0.57786899 > 10 5 1 0.67102887 > 11 1 2 0.05646411 > 12 2 3 0.01146612 > 13 3 1 -3.12094409 > 14 4 2 -1.01932191 > 15 5 3 0.76736702 > > > I want to turn this into a matrix of val according to row and col. Let's also > assume that some combinations of row and col are missing - i.e. there will be > NAs in the resulting Matrix. My current approach is simple and works but is > slow for large datasets: > > mat <- matrix(nrow=max(foo$row), ncol=max(foo$col)) > for (line in 1:dim(foo)[1]) { > mat[foo[line, 'row'], foo[line, 'col']] <- foo[line, 'val'] > } > >> mat > [,1] [,2] [,3] > [1,] -1.0063164 0.05646411 1.32961329 > [2,] -0.5118627 0.77715344 0.01146612 > [3,] -3.1209441 -1.20990127 0.17358793 > [4,] -1.6722699 -1.01932191 -0.57786899 > [5,] 0.6710289 1.08218836 0.76736702 > > > Can anyone think of a more efficient way?Here's one. > d <- read.table("clipboard") > with(d,tapply(val,list(row,col),"[[",1)) 1 2 3 1 -1.0063164 0.05646411 1.32961329 2 -0.5118627 0.77715344 0.01146612 3 -3.1209441 -1.20990127 0.17358793 4 -1.6722699 -1.01932191 -0.57786899 5 0.6710289 1.08218836 0.76736702 or use mean, min, max etc instead of "[[", 1. Also, there's matrix indexing > M <- matrix(,5,3) > attach(d) > M[cbind(row,col)]<-val > M [,1] [,2] [,3] [1,] -1.0063164 0.05646411 1.32961329 [2,] -0.5118627 0.77715344 0.01146612 [3,] -3.1209441 -1.20990127 0.17358793 [4,] -1.6722699 -1.01932191 -0.57786899 [5,] 0.6710289 1.08218836 0.76736702 -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Thank you Phil and Bert! I was sure there must be an efficient way using some kind of indexing trick but totally did not see the as.matrix solution. Thanks again Philipp On Wed, Nov 05, 2008 at 02:59:20PM -0800, Phil Spector wrote:> Philipp - > > res = matrix(NA,5,3) > res[as.matrix(foo[,c(1,2)])] = foo$val >-- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel