Hi,
I have a matrix of 4.5Kx4.5K elements with column- and row names
I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.
The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.
I was wondering whether there is a faster way of doing the same conversion.
This is how I am doing it now:
my.df <-data.frame()
for (i in 1:(nrow(out5.df)-1)){
for (j in i:ncol(out5.df)) {
# print(paste(" I am at position: row-", i, " and
col-", j, sep=""))
a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
my.df <- rbind(my.df, a)
}
}
this is an example for the data I have:
1 2 3 4 5 6 7
1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186
FBpp0312219 FBtr0346639 FBgn0010100
2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894
FBpp0312218 FBtr0346638 FBgn0026577
3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712
FBpp0312183 FBtr0346593 FBpp0312178
4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712
FBpp0312182 FBtr0346592 FBpp0312177
5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597
FBpp0312181 FBtr0346591 FBtr0346587
6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050
FBpp0312180 FBtr0346589 FBtr0346586
7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051
FBpp0312179 FBtr0346588 FBpp0312219
What I would like to get at the end is something like
that:> my.df
start start.1 Value
1 1 X1 FBgn0037249
2 1 X2 FBpp0312226
3 1 X3 FBtr0346646
4 1 X4 FBgn0266186
5 1 X5 FBpp0312219
6 1 X6 FBtr0346639
7 1 X7 FBgn0010100
8 2 X2 FBpp0312225
9 2 X3 FBtr0346645
10 2 X4 FBgn0037894
11 2 X5 FBpp0312218
12 2 X6 FBtr0346638
13 2 X7 FBgn0026577
14 3 X3 FBtr0346644
15 3 X4 FBgn0025712
16 3 X5 FBpp0312183
17 3 X6 FBtr0346593
18 3 X7 FBpp0312178
19 4 X4 FBgn0025712
20 4 X5 FBpp0312182
21 4 X6 FBtr0346592
22 4 X7 FBpp0312177
23 5 X5 FBpp0312181
24 5 X6 FBtr0346591
25 5 X7 FBtr0346587
26 6 X6 FBtr0346589
27 6 X7 FBtr0346586
Sp I would like to know if there is a better way of ding it than a double
for loop.
thanks
Assa
[[alternative HTML version deleted]]
library(reshape2) # you probably need to install reshape2 before this works
?melt
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On May 30, 2014 3:07:25 PM PDT, Assa Yeroslaviz <frymor at gmail.com>
wrote:>Hi,
>
>I have a matrix of 4.5Kx4.5K elements with column- and row names
>
>I need to convert this matrix into a table, where one column is the
>name of
>the row for the element, the second column is the name of the column
>for
>the same element and the third column is the element itself.
>
>The way I do it at the moment is with a double for-loop.
>With this way though it takes ages for the loop to finish.
>
>I was wondering whether there is a faster way of doing the same
>conversion.
>
>This is how I am doing it now:
>my.df <-data.frame()
>for (i in 1:(nrow(out5.df)-1)){
> for (j in i:ncol(out5.df)) {
># print(paste(" I am at position: row-", i, " and
col-", j,
>sep=""))
> a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
>Value=out5.df[i,j])
> my.df <- rbind(my.df, a)
> }
> }
>
>this is an example for the data I have:
> 1 2 3 4 5 6 7
>1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186
>FBpp0312219 FBtr0346639 FBgn0010100
>2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894
>FBpp0312218 FBtr0346638 FBgn0026577
>3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712
>FBpp0312183 FBtr0346593 FBpp0312178
>4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712
>FBpp0312182 FBtr0346592 FBpp0312177
>5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597
>FBpp0312181 FBtr0346591 FBtr0346587
>6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050
>FBpp0312180 FBtr0346589 FBtr0346586
>7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051
>FBpp0312179 FBtr0346588 FBpp0312219
>
>What I would like to get at the end is something like that:
>> my.df
> start start.1 Value
>1 1 X1 FBgn0037249
>2 1 X2 FBpp0312226
>3 1 X3 FBtr0346646
>4 1 X4 FBgn0266186
>5 1 X5 FBpp0312219
>6 1 X6 FBtr0346639
>7 1 X7 FBgn0010100
>8 2 X2 FBpp0312225
>9 2 X3 FBtr0346645
>10 2 X4 FBgn0037894
>11 2 X5 FBpp0312218
>12 2 X6 FBtr0346638
>13 2 X7 FBgn0026577
>14 3 X3 FBtr0346644
>15 3 X4 FBgn0025712
>16 3 X5 FBpp0312183
>17 3 X6 FBtr0346593
>18 3 X7 FBpp0312178
>19 4 X4 FBgn0025712
>20 4 X5 FBpp0312182
>21 4 X6 FBtr0346592
>22 4 X7 FBpp0312177
>23 5 X5 FBpp0312181
>24 5 X6 FBtr0346591
>25 5 X7 FBtr0346587
>26 6 X6 FBtr0346589
>27 6 X7 FBtr0346586
>
>
>Sp I would like to know if there is a better way of ding it than a
>double
>for loop.
>
>thanks
>Assa
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
On May 30, 2014, at 3:07 PM, Assa Yeroslaviz wrote:> Hi, > > I have a matrix of 4.5Kx4.5K elements with column- and row names > > I need to convert this matrix into a table, where one column is the name of > the row for the element, the second column is the name of the column for > the same element and the third column is the element itself.In R a "table" object is just a matrix with a class of table and there is a really kewl function to do exactly what you ask for on objects with class table so try this: class(out5.df) <- "table" my.df <- as.data.frame(out5.df)> > The way I do it at the moment is with a double for-loop. > With this way though it takes ages for the loop to finish. > > I was wondering whether there is a faster way of doing the same conversion. > > This is how I am doing it now: > my.df <-data.frame() > for (i in 1:(nrow(out5.df)-1)){ > for (j in i:ncol(out5.df)) { > # print(paste(" I am at position: row-", i, " and col-", j, sep="")) > a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j], > Value=out5.df[i,j]) > my.df <- rbind(my.df, a) > } > } > > this is an example for the data I have:I would have tested this if it had been offered using the output of dput() ?dput> out5.df <- matrix(1:30,5,6) > colnames(out5.df)<-letters[1:6] > rownames(out5.df)<-LETTERS[1:5] > class(out5.df) <- "table" > > my.df <- as.data.frame(out5.df) > > my.dfVar1 Var2 Freq 1 A a 1 2 B a 2 3 C a 3 4 D a 4 5 E a 5 6 A b 6 .......snippped the rest -- David.>> 1 2 3 4 5 6 7 > 1 FBgn0037249 FBpp0312226 FBtr0346646 FBgn0266186 > FBpp0312219 FBtr0346639 FBgn0010100 > 2 FBgn0036389 FBpp0312225 FBtr0346645 FBgn0037894 > FBpp0312218 FBtr0346638 FBgn0026577 > 3 FBgn0014002 FBpp0312224 FBtr0346644 FBgn0025712 > FBpp0312183 FBtr0346593 FBpp0312178 > 4 FBgn0034201 FBpp0312223 FBtr0346643 FBgn0025712 > FBpp0312182 FBtr0346592 FBpp0312177 > 5 FBgn0029860 FBpp0312222 FBtr0346642 FBgn0261597 > FBpp0312181 FBtr0346591 FBtr0346587 > 6 FBgn0028526 FBpp0312221 FBtr0346641 FBgn0263050 > FBpp0312180 FBtr0346589 FBtr0346586 > 7 FBgn0003486 FBpp0312220 FBtr0346640 FBgn0263051 > FBpp0312179 FBtr0346588 FBpp0312219 > > What I would like to get at the end is something like that: >> my.df > start start.1 Value > 1 1 X1 FBgn0037249 > 2 1 X2 FBpp0312226 > 3 1 X3 FBtr0346646 > 4 1 X4 FBgn0266186 > 5 1 X5 FBpp0312219 > 6 1 X6 FBtr0346639 > 7 1 X7 FBgn0010100 > 8 2 X2 FBpp0312225 > 9 2 X3 FBtr0346645 > 10 2 X4 FBgn0037894 > 11 2 X5 FBpp0312218 > 12 2 X6 FBtr0346638 > 13 2 X7 FBgn0026577 > 14 3 X3 FBtr0346644 > 15 3 X4 FBgn0025712 > 16 3 X5 FBpp0312183 > 17 3 X6 FBtr0346593 > 18 3 X7 FBpp0312178 > 19 4 X4 FBgn0025712 > 20 4 X5 FBpp0312182 > 21 4 X6 FBtr0346592 > 22 4 X7 FBpp0312177 > 23 5 X5 FBpp0312181 > 24 5 X6 FBtr0346591 > 25 5 X7 FBtr0346587 > 26 6 X6 FBtr0346589 > 27 6 X7 FBtr0346586 > > > Sp I would like to know if there is a better way of ding it than a double > for loop. > > thanks > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Hi,
You may try:
##Assuming the dataset is a matrix
mat <- structure(c("FBgn0037249", "FBgn0036389",
"FBgn0014002", "FBgn0034201",
"FBgn0029860", "FBgn0028526", "FBgn0003486",
"FBpp0312226", "FBpp0312225",
"FBpp0312224", "FBpp0312223", "FBpp0312222",
"FBpp0312221", "FBpp0312220",
"FBtr0346646", "FBtr0346645", "FBtr0346644",
"FBtr0346643", "FBtr0346642",
"FBtr0346641", "FBtr0346640", "FBgn0266186",
"FBgn0037894", "FBgn0025712",
"FBgn0025712", "FBgn0261597", "FBgn0263050",
"FBgn0263051", "FBpp0312219",
"FBpp0312218", "FBpp0312183", "FBpp0312182",
"FBpp0312181", "FBpp0312180",
"FBpp0312179", "FBtr0346639", "FBtr0346638",
"FBtr0346593", "FBtr0346592",
"FBtr0346591", "FBtr0346589", "FBtr0346588",
"FBgn0010100", "FBgn0026577",
"FBpp0312178", "FBpp0312177", "FBtr0346587",
"FBtr0346586", "FBpp0312219"
), .Dim = c(7L, 7L), .Dimnames = list(c("1", "2",
"3", "4", "5",
"6", "7"), c("1", "2", "3",
"4", "5", "6", "7")))
res <-? data.frame(start=rownames(mat)[col(mat)],
start.1=colnames(mat)[row(mat)], Value= c(t(mat)))
##Comparing the speed with other methods:
###For easy comparison across methods, converted the columns to factors
fun1 <- function(mat) {
??? start <- rownames(mat)[col(mat)]
??? start.1 <- paste0("X", colnames(mat)[row(mat)])
??? Value <- c(t(mat))
??? data.frame(start = factor(start, levels = unique(start)), start.1 =
factor(start.1,
??????? levels = unique(start.1)), Value)
}
fun2 <- function(mat) {
??? colnames(mat) <- paste0("X", colnames(mat))
??? my.df <- setNames(as.data.frame.table(mat), c("start",
"start.1", "Value"))
??? my.df <- my.df[with(my.df, order(start, start.1)), ]
??? row.names(my.df) <- 1:nrow(my.df)
??? my.df
}
library(reshape2)
fun3 <- function(mat) {
??? colnames(mat) <- paste0("X", colnames(mat))
??? my.df <- transform(setNames(melt(mat), c("start",
"start.1", "Value")), start = as.factor(start))
??? my.df <- my.df[with(my.df, order(start, start.1)), ]
??? row.names(my.df) <- 1:nrow(my.df)
??? my.df
}
set.seed(481)
mat1 <- matrix(sample(mat, 4.5e3*4.5e3, replace=TRUE), ncol=4.5e3,
dimnames=list(1:4.5e3, 1:4.5e3))
#system.time(res1 <- fun1(mat1))
#?? user? system elapsed
#? 7.914?? 0.836?? 8.750
?system.time(res2 <- fun2(mat1))
#?? user? system elapsed
# 28.257?? 1.336? 29.578
system.time(res3 <- fun3(mat1))
#?? user? system elapsed
# 27.213?? 1.027? 28.224
?
?identical(res1,res2)
#[1] TRUE
?identical(res1,res3)
#[1] TRUE
A.K.
On Friday, May 30, 2014 6:10 PM, Assa Yeroslaviz <frymor at gmail.com>
wrote:
Hi,
I have a matrix of 4.5Kx4.5K elements with column- and row names
I need to convert this matrix into a table, where one column is the name of
the row for the element, the second column is the name of the column for
the same element and the third column is the element itself.
The way I do it at the moment is with a double for-loop.
With this way though it takes ages for the loop to finish.
I was wondering whether there is a faster way of doing the same conversion.
This is how I am doing it now:
my.df <-data.frame()
for (i in 1:(nrow(out5.df)-1)){
? ? for (j in i:ncol(out5.df)) {
#? ? ? ? print(paste(" I am at position: row-", i, " and
col-", j, sep=""))
? ? ? ? a<- cbind(start=rownames(out5.df)[i], start.1=colnames(out5.df)[j],
Value=out5.df[i,j])
? ? ? ? my.df <- rbind(my.df, a)
? ? ? ? }
? ? }
this is an example for the data I have:
? ? 1? ? 2? ? 3? ? 4? ? 5? ? 6? ? 7
1? ? FBgn0037249? ? FBpp0312226? ? FBtr0346646? ? FBgn0266186
FBpp0312219? ? FBtr0346639? ? FBgn0010100
2? ? FBgn0036389? ? FBpp0312225? ? FBtr0346645? ? FBgn0037894
FBpp0312218? ? FBtr0346638? ? FBgn0026577
3? ? FBgn0014002? ? FBpp0312224? ? FBtr0346644? ? FBgn0025712
FBpp0312183? ? FBtr0346593? ? FBpp0312178
4? ? FBgn0034201? ? FBpp0312223? ? FBtr0346643? ? FBgn0025712
FBpp0312182? ? FBtr0346592? ? FBpp0312177
5? ? FBgn0029860? ? FBpp0312222? ? FBtr0346642? ? FBgn0261597
FBpp0312181? ? FBtr0346591? ? FBtr0346587
6? ? FBgn0028526? ? FBpp0312221? ? FBtr0346641? ? FBgn0263050
FBpp0312180? ? FBtr0346589? ? FBtr0346586
7? ? FBgn0003486? ? FBpp0312220? ? FBtr0346640? ? FBgn0263051
FBpp0312179? ? FBtr0346588? ? FBpp0312219
What I would like to get at the end is something like
that:> my.df
? start start.1? ? ? Value
1? ? ? 1? ? ? X1 FBgn0037249
2? ? ? 1? ? ? X2 FBpp0312226
3? ? ? 1? ? ? X3 FBtr0346646
4? ? ? 1? ? ? X4 FBgn0266186
5? ? ? 1? ? ? X5 FBpp0312219
6? ? ? 1? ? ? X6 FBtr0346639
7? ? ? 1? ? ? X7 FBgn0010100
8? ? ? 2? ? ? X2 FBpp0312225
9? ? ? 2? ? ? X3 FBtr0346645
10? ? 2? ? ? X4 FBgn0037894
11? ? 2? ? ? X5 FBpp0312218
12? ? 2? ? ? X6 FBtr0346638
13? ? 2? ? ? X7 FBgn0026577
14? ? 3? ? ? X3 FBtr0346644
15? ? 3? ? ? X4 FBgn0025712
16? ? 3? ? ? X5 FBpp0312183
17? ? 3? ? ? X6 FBtr0346593
18? ? 3? ? ? X7 FBpp0312178
19? ? 4? ? ? X4 FBgn0025712
20? ? 4? ? ? X5 FBpp0312182
21? ? 4? ? ? X6 FBtr0346592
22? ? 4? ? ? X7 FBpp0312177
23? ? 5? ? ? X5 FBpp0312181
24? ? 5? ? ? X6 FBtr0346591
25? ? 5? ? ? X7 FBtr0346587
26? ? 6? ? ? X6 FBtr0346589
27? ? 6? ? ? X7 FBtr0346586
Sp I would like to know if there is a better way of ding it than a double
for loop.
thanks
Assa
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.