Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df <-data.frame() for (i in 1:(nrow(out5.df)-1)){ for (j in i:ncol(out5.df)) { # print(paste(" I am at position: row-", i, " and col-", j, sep="")) a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) my.df <- rbind(my.df, a) } } this is an example for the data I have: 1 2 3 4 5 6 7 1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186 FBpp0312219 FBtr0346639 FBgn0010100 2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894 FBpp0312218 FBtr0346638 FBgn0026577 3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712 FBpp0312183 FBtr0346593 FBpp0312178 4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712 FBpp0312182 FBtr0346592 FBpp0312177 5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597 FBpp0312181 FBtr0346591 FBtr0346587 6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050 FBpp0312180 FBtr0346589 FBtr0346586 7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051 FBpp0312179 FBtr0346588 FBpp0312219 What I would like to get at the end is something like that:> my.dfstart start.1 Value 1 1 X1 FBgn0037249 2 1 X2 FBpp0312226 3 1 X3 FBtr0346646 4 1 X4 FBgn0266186 5 1 X5 FBpp0312219 6 1 X6 FBtr0346639 7 1 X7 FBgn0010100 8 2 X2 FBpp0312225 9 2 X3 FBtr0346645 10 2 X4 FBgn0037894 11 2 X5 FBpp0312218 12 2 X6 FBtr0346638 13 2 X7 FBgn0026577 14 3 X3 FBtr0346644 15 3 X4 FBgn0025712 16 3 X5 FBpp0312183 17 3 X6 FBtr0346593 18 3 X7 FBpp0312178 19 4 X4 FBgn0025712 20 4 X5 FBpp0312182 21 4 X6 FBtr0346592 22 4 X7 FBpp0312177 23 5 X5 FBpp0312181 24 5 X6 FBtr0346591 25 5 X7 FBtr0346587 26 6 X6 FBtr0346589 27 6 X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa [[alternative HTML version deleted]]
library(reshape2) # you probably need to install reshape2 before this works ?melt --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On May 30, 2014 3:07:25 PM PDT, Assa Yeroslaviz <frymor at gmail.com> wrote:>Hi, > >I have a matrix of 4.5Kx4.5K elements with column- and row names > >I need to convert this matrix into a table, where one column is the >name of >the row for the element, the second column is the name of the column >for >the same element and the third column is the element itself. > >The way I do it at the moment is with a double for-loop. >With this way though it takes ages for the loop to finish. > >I was wondering whether there is a faster way of doing the same >conversion. > >This is how I am doing it now: >my.df <-data.frame() >for (i in 1:(nrow(out5.df)-1)){ > for (j in i:ncol(out5.df)) { ># print(paste(" I am at position: row-", i, " and col-", j, >sep="")) > a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], >Value=out5.df[i,j]) > my.df <- rbind(my.df, a) > } > } > >this is an example for the data I have: > 1 2 3 4 5 6 7 >1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186 >FBpp0312219 FBtr0346639 FBgn0010100 >2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894 >FBpp0312218 FBtr0346638 FBgn0026577 >3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712 >FBpp0312183 FBtr0346593 FBpp0312178 >4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712 >FBpp0312182 FBtr0346592 FBpp0312177 >5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597 >FBpp0312181 FBtr0346591 FBtr0346587 >6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050 >FBpp0312180 FBtr0346589 FBtr0346586 >7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051 >FBpp0312179 FBtr0346588 FBpp0312219 > >What I would like to get at the end is something like that: >> my.df > start start.1 Value >1 1 X1 FBgn0037249 >2 1 X2 FBpp0312226 >3 1 X3 FBtr0346646 >4 1 X4 FBgn0266186 >5 1 X5 FBpp0312219 >6 1 X6 FBtr0346639 >7 1 X7 FBgn0010100 >8 2 X2 FBpp0312225 >9 2 X3 FBtr0346645 >10 2 X4 FBgn0037894 >11 2 X5 FBpp0312218 >12 2 X6 FBtr0346638 >13 2 X7 FBgn0026577 >14 3 X3 FBtr0346644 >15 3 X4 FBgn0025712 >16 3 X5 FBpp0312183 >17 3 X6 FBtr0346593 >18 3 X7 FBpp0312178 >19 4 X4 FBgn0025712 >20 4 X5 FBpp0312182 >21 4 X6 FBtr0346592 >22 4 X7 FBpp0312177 >23 5 X5 FBpp0312181 >24 5 X6 FBtr0346591 >25 5 X7 FBtr0346587 >26 6 X6 FBtr0346589 >27 6 X7 FBtr0346586 > > >Sp I would like to know if there is a better way of ding it than a >double >for loop. > >thanks >Assa > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
On May 30, 2014, at 3:07 PM, Assa Yeroslaviz wrote:> Hi, > > I have a matrix of 4.5Kx4.5K elements with column- and row names > > I need to convert this matrix into a table, where one column is the name of > the row for the element, the second column is the name of the column for > the same element and the third column is the element itself.In R a "table" object is just a matrix with a class of table and there is a really kewl function to do exactly what you ask for on objects with class table so try this: class(out5.df) <- "table" my.df <- as.data.frame(out5.df)> > The way I do it at the moment is with a double for-loop. > With this way though it takes ages for the loop to finish. > > I was wondering whether there is a faster way of doing the same conversion. > > This is how I am doing it now: > my.df <-data.frame() > for (i in 1:(nrow(out5.df)-1)){ > for (j in i:ncol(out5.df)) { > # print(paste(" I am at position: row-", i, " and col-", j, sep="")) > a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], > Value=out5.df[i,j]) > my.df <- rbind(my.df, a) > } > } > > this is an example for the data I have:I would have tested this if it had been offered using the output of dput() ?dput> out5.df <- matrix(1:30,5,6) > colnames(out5.df)<-letters[1:6] > rownames(out5.df)<-LETTERS[1:5] > class(out5.df) <- "table" > > my.df <- as.data.frame(out5.df) > > my.dfVar1 Var2 Freq 1 A a 1 2 B a 2 3 C a 3 4 D a 4 5 E a 5 6 A b 6 .......snippped the rest -- David.>> 1 2 3 4 5 6 7 > 1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186 > FBpp0312219 FBtr0346639 FBgn0010100 > 2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894 > FBpp0312218 FBtr0346638 FBgn0026577 > 3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712 > FBpp0312183 FBtr0346593 FBpp0312178 > 4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712 > FBpp0312182 FBtr0346592 FBpp0312177 > 5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597 > FBpp0312181 FBtr0346591 FBtr0346587 > 6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050 > FBpp0312180 FBtr0346589 FBtr0346586 > 7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051 > FBpp0312179 FBtr0346588 FBpp0312219 > > What I would like to get at the end is something like that: >> my.df > start start.1 Value > 1 1 X1 FBgn0037249 > 2 1 X2 FBpp0312226 > 3 1 X3 FBtr0346646 > 4 1 X4 FBgn0266186 > 5 1 X5 FBpp0312219 > 6 1 X6 FBtr0346639 > 7 1 X7 FBgn0010100 > 8 2 X2 FBpp0312225 > 9 2 X3 FBtr0346645 > 10 2 X4 FBgn0037894 > 11 2 X5 FBpp0312218 > 12 2 X6 FBtr0346638 > 13 2 X7 FBgn0026577 > 14 3 X3 FBtr0346644 > 15 3 X4 FBgn0025712 > 16 3 X5 FBpp0312183 > 17 3 X6 FBtr0346593 > 18 3 X7 FBpp0312178 > 19 4 X4 FBgn0025712 > 20 4 X5 FBpp0312182 > 21 4 X6 FBtr0346592 > 22 4 X7 FBpp0312177 > 23 5 X5 FBpp0312181 > 24 5 X6 FBtr0346591 > 25 5 X7 FBtr0346587 > 26 6 X6 FBtr0346589 > 27 6 X7 FBtr0346586 > > > Sp I would like to know if there is a better way of ding it than a double > for loop. > > thanks > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Hi, You may try: ##Assuming the dataset is a matrix mat <- structure(c("FBgn0037249", "FBgn0036389", "FBgn0014002", "FBgn0034201", "FBgn0029860", "FBgn0028526", "FBgn0003486", "FBpp0312226", "FBpp0312225", "FBpp0312224", "FBpp0312223", "FBpp0312222", "FBpp0312221", "FBpp0312220", "FBtr0346646", "FBtr0346645", "FBtr0346644", "FBtr0346643", "FBtr0346642", "FBtr0346641", "FBtr0346640", "FBgn0266186", "FBgn0037894", "FBgn0025712", "FBgn0025712", "FBgn0261597", "FBgn0263050", "FBgn0263051", "FBpp0312219", "FBpp0312218", "FBpp0312183", "FBpp0312182", "FBpp0312181", "FBpp0312180", "FBpp0312179", "FBtr0346639", "FBtr0346638", "FBtr0346593", "FBtr0346592", "FBtr0346591", "FBtr0346589", "FBtr0346588", "FBgn0010100", "FBgn0026577", "FBpp0312178", "FBpp0312177", "FBtr0346587", "FBtr0346586", "FBpp0312219" ), .Dim = c(7L, 7L), .Dimnames = list(c("1", "2", "3", "4", "5", "6", "7"), c("1", "2", "3", "4", "5", "6", "7"))) res <-? data.frame(start=rownames(mat)[col(mat)], start.1=colnames(mat)[row(mat)], Value= c(t(mat))) ##Comparing the speed with other methods: ###For easy comparison across methods, converted the columns to factors fun1 <- function(mat) { ??? start <- rownames(mat)[col(mat)] ??? start.1 <- paste0("X", colnames(mat)[row(mat)]) ??? Value <- c(t(mat)) ??? data.frame(start = factor(start, levels = unique(start)), start.1 = factor(start.1, ??????? levels = unique(start.1)), Value) } fun2 <- function(mat) { ??? colnames(mat) <- paste0("X", colnames(mat)) ??? my.df <- setNames(as.data.frame.table(mat), c("start", "start.1", "Value")) ??? my.df <- my.df[with(my.df, order(start, start.1)), ] ??? row.names(my.df) <- 1:nrow(my.df) ??? my.df } library(reshape2) fun3 <- function(mat) { ??? colnames(mat) <- paste0("X", colnames(mat)) ??? my.df <- transform(setNames(melt(mat), c("start", "start.1", "Value")), start = as.factor(start)) ??? my.df <- my.df[with(my.df, order(start, start.1)), ] ??? row.names(my.df) <- 1:nrow(my.df) ??? my.df } set.seed(481) mat1 <- matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3, dimnames=list(1:4.5e3, 1:4.5e3)) #system.time(res1 <- fun1(mat1)) #?? user? system elapsed #? 7.914?? 0.836?? 8.750 ?system.time(res2 <- fun2(mat1)) #?? user? system elapsed # 28.257?? 1.336? 29.578 system.time(res3 <- fun3(mat1)) #?? user? system elapsed # 27.213?? 1.027? 28.224 ? ?identical(res1,res2) #[1] TRUE ?identical(res1,res3) #[1] TRUE A.K. On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz <frymor at gmail.com> wrote: Hi, I have a matrix of 4.5Kx4.5K elements with column- and row names I need to convert this matrix into a table, where one column is the name of the row for the element, the second column is the name of the column for the same element and the third column is the element itself. The way I do it at the moment is with a double for-loop. With this way though it takes ages for the loop to finish. I was wondering whether there is a faster way of doing the same conversion. This is how I am doing it now: my.df <-data.frame() for (i in 1:(nrow(out5.df)-1)){ ? ? for (j in i:ncol(out5.df)) { #? ? ? ? print(paste(" I am at position: row-", i, " and col-", j, sep="")) ? ? ? ? a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], Value=out5.df[i,j]) ? ? ? ? my.df <- rbind(my.df, a) ? ? ? ? } ? ? } this is an example for the data I have: ? ? 1? ? 2? ? 3? ? 4? ? 5? ? 6? ? 7 1? ? FBgn0037249? ? FBpp0312226? ? FBtr0346646? ? FBgn0266186 FBpp0312219? ? FBtr0346639? ? FBgn0010100 2? ? FBgn0036389? ? FBpp0312225? ? FBtr0346645? ? FBgn0037894 FBpp0312218? ? FBtr0346638? ? FBgn0026577 3? ? FBgn0014002? ? FBpp0312224? ? FBtr0346644? ? FBgn0025712 FBpp0312183? ? FBtr0346593? ? FBpp0312178 4? ? FBgn0034201? ? FBpp0312223? ? FBtr0346643? ? FBgn0025712 FBpp0312182? ? FBtr0346592? ? FBpp0312177 5? ? FBgn0029860? ? FBpp0312222? ? FBtr0346642? ? FBgn0261597 FBpp0312181? ? FBtr0346591? ? FBtr0346587 6? ? FBgn0028526? ? FBpp0312221? ? FBtr0346641? ? FBgn0263050 FBpp0312180? ? FBtr0346589? ? FBtr0346586 7? ? FBgn0003486? ? FBpp0312220? ? FBtr0346640? ? FBgn0263051 FBpp0312179? ? FBtr0346588? ? FBpp0312219 What I would like to get at the end is something like that:> my.df? start start.1? ? ? Value 1? ? ? 1? ? ? X1 FBgn0037249 2? ? ? 1? ? ? X2 FBpp0312226 3? ? ? 1? ? ? X3 FBtr0346646 4? ? ? 1? ? ? X4 FBgn0266186 5? ? ? 1? ? ? X5 FBpp0312219 6? ? ? 1? ? ? X6 FBtr0346639 7? ? ? 1? ? ? X7 FBgn0010100 8? ? ? 2? ? ? X2 FBpp0312225 9? ? ? 2? ? ? X3 FBtr0346645 10? ? 2? ? ? X4 FBgn0037894 11? ? 2? ? ? X5 FBpp0312218 12? ? 2? ? ? X6 FBtr0346638 13? ? 2? ? ? X7 FBgn0026577 14? ? 3? ? ? X3 FBtr0346644 15? ? 3? ? ? X4 FBgn0025712 16? ? 3? ? ? X5 FBpp0312183 17? ? 3? ? ? X6 FBtr0346593 18? ? 3? ? ? X7 FBpp0312178 19? ? 4? ? ? X4 FBgn0025712 20? ? 4? ? ? X5 FBpp0312182 21? ? 4? ? ? X6 FBtr0346592 22? ? 4? ? ? X7 FBpp0312177 23? ? 5? ? ? X5 FBpp0312181 24? ? 5? ? ? X6 FBtr0346591 25? ? 5? ? ? X7 FBtr0346587 26? ? 6? ? ? X6 FBtr0346589 27? ? 6? ? ? X7 FBtr0346586 Sp I would like to know if there is a better way of ding it than a double for loop. thanks Assa ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.