Hello,
I have two data frames:
head(a)
GENE rs BETA
1 ENSG00000154803 rs2605134 0.0360182
2 ENSG00000154803 rs7405677 0.0525463
3 ENSG00000154803 rs7211573 0.0525531
4 ENSG00000154803 rs2746026 0.0466392
5 ENSG00000141030 rs2605134 0.0806140
6 ENSG00000141030 rs7405677 0.0251654
7 ENSG00000141030 rs7211573 0.0252775
8 ENSG00000141030 rs2746026 0.0976396
9 ENSG00000205309 rs2605134 0.0838975
10 ENSG00000205309 rs7405677 -0.2148500
11 ENSG00000205309 rs7211573 -0.2148170
12 ENSG00000205309 rs2746026 0.1013920
13 ENSG00000215030 rs2605134 0.1261050
14 ENSG00000215030 rs7405677 0.0165236
15 ENSG00000215030 rs7211573 0.0163509
16 ENSG00000215030 rs2746026 0.1201180
17 ENSG00000141026 rs2605134 0.0485897
18 ENSG00000141026 rs7405677 -0.0929964
19 ENSG00000141026 rs7211573 -0.0930321
20 ENSG00000141026 rs2746026 0.0623033
head(b)
rs GWAS
1 rs2605134 0.0315177
2 rs7405677 -0.0816389
3 rs7211573 -0.0797796
4 rs2746026 0.0199350
5 rs11658521 0.0728377
6 rs9914107 0.0720096
7 rs56964223 0.0723903
Data frame a has:> length(unique(a$GENE))
[1] 51> dim(a)
[1] 287 3
and the whole data frame b is shown
I would like to create a txt file which would have rs match for each
ENSG from data frame b. If a particular ENSG does not have matching rs
from data frame b the value under it would be zero. So the txt file
would have 7 rows (for all those unique rs from data frame b) and 53
columns (for 51 ENSGs and one for unique rs and one for GWAS)
So one row of that txt file would look like this.
GENES ENSG00000154803 ENSG00000141030 ENSG00000205309
ENSG00000215030 ENSG00000141026 GWAS
rs2605134 0.0360182 0.0806140 0.0838975
0.1261050 0.0485897 0.0315177
?
Please advise,
Ana
Hi Ana,
Is this what you want?
a<-read.table(text="GENE rs BETA
1 ENSG00000154803 rs2605134 0.0360182
2 ENSG00000154803 rs7405677 0.0525463
3 ENSG00000154803 rs7211573 0.0525531
4 ENSG00000154803 rs2746026 0.0466392
5 ENSG00000141030 rs2605134 0.0806140
6 ENSG00000141030 rs7405677 0.0251654
7 ENSG00000141030 rs7211573 0.0252775
8 ENSG00000141030 rs2746026 0.0976396
9 ENSG00000205309 rs2605134 0.0838975
10 ENSG00000205309 rs7405677 -0.2148500
11 ENSG00000205309 rs7211573 -0.2148170
12 ENSG00000205309 rs2746026 0.1013920
13 ENSG00000215030 rs2605134 0.1261050
14 ENSG00000215030 rs7405677 0.0165236
15 ENSG00000215030 rs7211573 0.0163509
16 ENSG00000215030 rs2746026 0.1201180
17 ENSG00000141026 rs2605134 0.0485897
18 ENSG00000141026 rs7405677 -0.0929964
19 ENSG00000141026 rs7211573 -0.0930321
20 ENSG00000141026 rs2746026 0.0623033",
header=TRUE,stringsAsFactors=FALSE)
b<-read.table(text="rs GWAS
1 rs2605134 0.0315177
2 rs7405677 -0.0816389
3 rs7211573 -0.0797796
4 rs2746026 0.0199350
5 rs11658521 0.0728377
6 rs9914107 0.0720096
7 rs56964223 0.0723903",
header=TRUE,stringsAsFactors=FALSE)
ab<-merge(a,b,by="rs")
library(prettyR)
abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA"))
Jiim
On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <sokovic.anamarija at
gmail.com> wrote:>
> Hello,
>
> I have two data frames:
>
> head(a)
> GENE rs BETA
> 1 ENSG00000154803 rs2605134 0.0360182
> 2 ENSG00000154803 rs7405677 0.0525463
> 3 ENSG00000154803 rs7211573 0.0525531
> 4 ENSG00000154803 rs2746026 0.0466392
> 5 ENSG00000141030 rs2605134 0.0806140
> 6 ENSG00000141030 rs7405677 0.0251654
> 7 ENSG00000141030 rs7211573 0.0252775
> 8 ENSG00000141030 rs2746026 0.0976396
> 9 ENSG00000205309 rs2605134 0.0838975
> 10 ENSG00000205309 rs7405677 -0.2148500
> 11 ENSG00000205309 rs7211573 -0.2148170
> 12 ENSG00000205309 rs2746026 0.1013920
> 13 ENSG00000215030 rs2605134 0.1261050
> 14 ENSG00000215030 rs7405677 0.0165236
> 15 ENSG00000215030 rs7211573 0.0163509
> 16 ENSG00000215030 rs2746026 0.1201180
> 17 ENSG00000141026 rs2605134 0.0485897
> 18 ENSG00000141026 rs7405677 -0.0929964
> 19 ENSG00000141026 rs7211573 -0.0930321
> 20 ENSG00000141026 rs2746026 0.0623033
>
> head(b)
> rs GWAS
> 1 rs2605134 0.0315177
> 2 rs7405677 -0.0816389
> 3 rs7211573 -0.0797796
> 4 rs2746026 0.0199350
> 5 rs11658521 0.0728377
> 6 rs9914107 0.0720096
> 7 rs56964223 0.0723903
>
> Data frame a has:
> > length(unique(a$GENE))
> [1] 51
> > dim(a)
> [1] 287 3
>
> and the whole data frame b is shown
>
> I would like to create a txt file which would have rs match for each
> ENSG from data frame b. If a particular ENSG does not have matching rs
> from data frame b the value under it would be zero. So the txt file
> would have 7 rows (for all those unique rs from data frame b) and 53
> columns (for 51 ENSGs and one for unique rs and one for GWAS)
>
> So one row of that txt file would look like this.
>
> GENES ENSG00000154803 ENSG00000141030 ENSG00000205309
> ENSG00000215030 ENSG00000141026 GWAS
> rs2605134 0.0360182 0.0806140 0.0838975
> 0.1261050 0.0485897 0.0315177
> ?
>
> Please advise,
> Ana
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Thanks for getting back to me, I resolved my problem with this: library(reshape2) c=dcast(a, rs ~ GENE) d=merge(c,b,by="rs") d[is.na(d)] <- 0 On Sun, Dec 8, 2019 at 11:03 PM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ana, > Is this what you want? > > a<-read.table(text="GENE rs BETA > 1 ENSG00000154803 rs2605134 0.0360182 > 2 ENSG00000154803 rs7405677 0.0525463 > 3 ENSG00000154803 rs7211573 0.0525531 > 4 ENSG00000154803 rs2746026 0.0466392 > 5 ENSG00000141030 rs2605134 0.0806140 > 6 ENSG00000141030 rs7405677 0.0251654 > 7 ENSG00000141030 rs7211573 0.0252775 > 8 ENSG00000141030 rs2746026 0.0976396 > 9 ENSG00000205309 rs2605134 0.0838975 > 10 ENSG00000205309 rs7405677 -0.2148500 > 11 ENSG00000205309 rs7211573 -0.2148170 > 12 ENSG00000205309 rs2746026 0.1013920 > 13 ENSG00000215030 rs2605134 0.1261050 > 14 ENSG00000215030 rs7405677 0.0165236 > 15 ENSG00000215030 rs7211573 0.0163509 > 16 ENSG00000215030 rs2746026 0.1201180 > 17 ENSG00000141026 rs2605134 0.0485897 > 18 ENSG00000141026 rs7405677 -0.0929964 > 19 ENSG00000141026 rs7211573 -0.0930321 > 20 ENSG00000141026 rs2746026 0.0623033", > header=TRUE,stringsAsFactors=FALSE) > b<-read.table(text="rs GWAS > 1 rs2605134 0.0315177 > 2 rs7405677 -0.0816389 > 3 rs7211573 -0.0797796 > 4 rs2746026 0.0199350 > 5 rs11658521 0.0728377 > 6 rs9914107 0.0720096 > 7 rs56964223 0.0723903", > header=TRUE,stringsAsFactors=FALSE) > ab<-merge(a,b,by="rs") > library(prettyR) > abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA")) > > Jiim > > On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > Hello, > > > > I have two data frames: > > > > head(a) > > GENE rs BETA > > 1 ENSG00000154803 rs2605134 0.0360182 > > 2 ENSG00000154803 rs7405677 0.0525463 > > 3 ENSG00000154803 rs7211573 0.0525531 > > 4 ENSG00000154803 rs2746026 0.0466392 > > 5 ENSG00000141030 rs2605134 0.0806140 > > 6 ENSG00000141030 rs7405677 0.0251654 > > 7 ENSG00000141030 rs7211573 0.0252775 > > 8 ENSG00000141030 rs2746026 0.0976396 > > 9 ENSG00000205309 rs2605134 0.0838975 > > 10 ENSG00000205309 rs7405677 -0.2148500 > > 11 ENSG00000205309 rs7211573 -0.2148170 > > 12 ENSG00000205309 rs2746026 0.1013920 > > 13 ENSG00000215030 rs2605134 0.1261050 > > 14 ENSG00000215030 rs7405677 0.0165236 > > 15 ENSG00000215030 rs7211573 0.0163509 > > 16 ENSG00000215030 rs2746026 0.1201180 > > 17 ENSG00000141026 rs2605134 0.0485897 > > 18 ENSG00000141026 rs7405677 -0.0929964 > > 19 ENSG00000141026 rs7211573 -0.0930321 > > 20 ENSG00000141026 rs2746026 0.0623033 > > > > head(b) > > rs GWAS > > 1 rs2605134 0.0315177 > > 2 rs7405677 -0.0816389 > > 3 rs7211573 -0.0797796 > > 4 rs2746026 0.0199350 > > 5 rs11658521 0.0728377 > > 6 rs9914107 0.0720096 > > 7 rs56964223 0.0723903 > > > > Data frame a has: > > > length(unique(a$GENE)) > > [1] 51 > > > dim(a) > > [1] 287 3 > > > > and the whole data frame b is shown > > > > I would like to create a txt file which would have rs match for each > > ENSG from data frame b. If a particular ENSG does not have matching rs > > from data frame b the value under it would be zero. So the txt file > > would have 7 rows (for all those unique rs from data frame b) and 53 > > columns (for 51 ENSGs and one for unique rs and one for GWAS) > > > > So one row of that txt file would look like this. > > > > GENES ENSG00000154803 ENSG00000141030 ENSG00000205309 > > ENSG00000215030 ENSG00000141026 GWAS > > rs2605134 0.0360182 0.0806140 0.0838975 > > 0.1261050 0.0485897 0.0315177 > > ? > > > > Please advise, > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.