Hello, I have two data frames: head(a) GENE rs BETA 1 ENSG00000154803 rs2605134 0.0360182 2 ENSG00000154803 rs7405677 0.0525463 3 ENSG00000154803 rs7211573 0.0525531 4 ENSG00000154803 rs2746026 0.0466392 5 ENSG00000141030 rs2605134 0.0806140 6 ENSG00000141030 rs7405677 0.0251654 7 ENSG00000141030 rs7211573 0.0252775 8 ENSG00000141030 rs2746026 0.0976396 9 ENSG00000205309 rs2605134 0.0838975 10 ENSG00000205309 rs7405677 -0.2148500 11 ENSG00000205309 rs7211573 -0.2148170 12 ENSG00000205309 rs2746026 0.1013920 13 ENSG00000215030 rs2605134 0.1261050 14 ENSG00000215030 rs7405677 0.0165236 15 ENSG00000215030 rs7211573 0.0163509 16 ENSG00000215030 rs2746026 0.1201180 17 ENSG00000141026 rs2605134 0.0485897 18 ENSG00000141026 rs7405677 -0.0929964 19 ENSG00000141026 rs7211573 -0.0930321 20 ENSG00000141026 rs2746026 0.0623033 head(b) rs GWAS 1 rs2605134 0.0315177 2 rs7405677 -0.0816389 3 rs7211573 -0.0797796 4 rs2746026 0.0199350 5 rs11658521 0.0728377 6 rs9914107 0.0720096 7 rs56964223 0.0723903 Data frame a has:> length(unique(a$GENE))[1] 51> dim(a)[1] 287 3 and the whole data frame b is shown I would like to create a txt file which would have rs match for each ENSG from data frame b. If a particular ENSG does not have matching rs from data frame b the value under it would be zero. So the txt file would have 7 rows (for all those unique rs from data frame b) and 53 columns (for 51 ENSGs and one for unique rs and one for GWAS) So one row of that txt file would look like this. GENES ENSG00000154803 ENSG00000141030 ENSG00000205309 ENSG00000215030 ENSG00000141026 GWAS rs2605134 0.0360182 0.0806140 0.0838975 0.1261050 0.0485897 0.0315177 ? Please advise, Ana
Hi Ana, Is this what you want? a<-read.table(text="GENE rs BETA 1 ENSG00000154803 rs2605134 0.0360182 2 ENSG00000154803 rs7405677 0.0525463 3 ENSG00000154803 rs7211573 0.0525531 4 ENSG00000154803 rs2746026 0.0466392 5 ENSG00000141030 rs2605134 0.0806140 6 ENSG00000141030 rs7405677 0.0251654 7 ENSG00000141030 rs7211573 0.0252775 8 ENSG00000141030 rs2746026 0.0976396 9 ENSG00000205309 rs2605134 0.0838975 10 ENSG00000205309 rs7405677 -0.2148500 11 ENSG00000205309 rs7211573 -0.2148170 12 ENSG00000205309 rs2746026 0.1013920 13 ENSG00000215030 rs2605134 0.1261050 14 ENSG00000215030 rs7405677 0.0165236 15 ENSG00000215030 rs7211573 0.0163509 16 ENSG00000215030 rs2746026 0.1201180 17 ENSG00000141026 rs2605134 0.0485897 18 ENSG00000141026 rs7405677 -0.0929964 19 ENSG00000141026 rs7211573 -0.0930321 20 ENSG00000141026 rs2746026 0.0623033", header=TRUE,stringsAsFactors=FALSE) b<-read.table(text="rs GWAS 1 rs2605134 0.0315177 2 rs7405677 -0.0816389 3 rs7211573 -0.0797796 4 rs2746026 0.0199350 5 rs11658521 0.0728377 6 rs9914107 0.0720096 7 rs56964223 0.0723903", header=TRUE,stringsAsFactors=FALSE) ab<-merge(a,b,by="rs") library(prettyR) abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA")) Jiim On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:> > Hello, > > I have two data frames: > > head(a) > GENE rs BETA > 1 ENSG00000154803 rs2605134 0.0360182 > 2 ENSG00000154803 rs7405677 0.0525463 > 3 ENSG00000154803 rs7211573 0.0525531 > 4 ENSG00000154803 rs2746026 0.0466392 > 5 ENSG00000141030 rs2605134 0.0806140 > 6 ENSG00000141030 rs7405677 0.0251654 > 7 ENSG00000141030 rs7211573 0.0252775 > 8 ENSG00000141030 rs2746026 0.0976396 > 9 ENSG00000205309 rs2605134 0.0838975 > 10 ENSG00000205309 rs7405677 -0.2148500 > 11 ENSG00000205309 rs7211573 -0.2148170 > 12 ENSG00000205309 rs2746026 0.1013920 > 13 ENSG00000215030 rs2605134 0.1261050 > 14 ENSG00000215030 rs7405677 0.0165236 > 15 ENSG00000215030 rs7211573 0.0163509 > 16 ENSG00000215030 rs2746026 0.1201180 > 17 ENSG00000141026 rs2605134 0.0485897 > 18 ENSG00000141026 rs7405677 -0.0929964 > 19 ENSG00000141026 rs7211573 -0.0930321 > 20 ENSG00000141026 rs2746026 0.0623033 > > head(b) > rs GWAS > 1 rs2605134 0.0315177 > 2 rs7405677 -0.0816389 > 3 rs7211573 -0.0797796 > 4 rs2746026 0.0199350 > 5 rs11658521 0.0728377 > 6 rs9914107 0.0720096 > 7 rs56964223 0.0723903 > > Data frame a has: > > length(unique(a$GENE)) > [1] 51 > > dim(a) > [1] 287 3 > > and the whole data frame b is shown > > I would like to create a txt file which would have rs match for each > ENSG from data frame b. If a particular ENSG does not have matching rs > from data frame b the value under it would be zero. So the txt file > would have 7 rows (for all those unique rs from data frame b) and 53 > columns (for 51 ENSGs and one for unique rs and one for GWAS) > > So one row of that txt file would look like this. > > GENES ENSG00000154803 ENSG00000141030 ENSG00000205309 > ENSG00000215030 ENSG00000141026 GWAS > rs2605134 0.0360182 0.0806140 0.0838975 > 0.1261050 0.0485897 0.0315177 > ? > > Please advise, > Ana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks for getting back to me, I resolved my problem with this: library(reshape2) c=dcast(a, rs ~ GENE) d=merge(c,b,by="rs") d[is.na(d)] <- 0 On Sun, Dec 8, 2019 at 11:03 PM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ana, > Is this what you want? > > a<-read.table(text="GENE rs BETA > 1 ENSG00000154803 rs2605134 0.0360182 > 2 ENSG00000154803 rs7405677 0.0525463 > 3 ENSG00000154803 rs7211573 0.0525531 > 4 ENSG00000154803 rs2746026 0.0466392 > 5 ENSG00000141030 rs2605134 0.0806140 > 6 ENSG00000141030 rs7405677 0.0251654 > 7 ENSG00000141030 rs7211573 0.0252775 > 8 ENSG00000141030 rs2746026 0.0976396 > 9 ENSG00000205309 rs2605134 0.0838975 > 10 ENSG00000205309 rs7405677 -0.2148500 > 11 ENSG00000205309 rs7211573 -0.2148170 > 12 ENSG00000205309 rs2746026 0.1013920 > 13 ENSG00000215030 rs2605134 0.1261050 > 14 ENSG00000215030 rs7405677 0.0165236 > 15 ENSG00000215030 rs7211573 0.0163509 > 16 ENSG00000215030 rs2746026 0.1201180 > 17 ENSG00000141026 rs2605134 0.0485897 > 18 ENSG00000141026 rs7405677 -0.0929964 > 19 ENSG00000141026 rs7211573 -0.0930321 > 20 ENSG00000141026 rs2746026 0.0623033", > header=TRUE,stringsAsFactors=FALSE) > b<-read.table(text="rs GWAS > 1 rs2605134 0.0315177 > 2 rs7405677 -0.0816389 > 3 rs7211573 -0.0797796 > 4 rs2746026 0.0199350 > 5 rs11658521 0.0728377 > 6 rs9914107 0.0720096 > 7 rs56964223 0.0723903", > header=TRUE,stringsAsFactors=FALSE) > ab<-merge(a,b,by="rs") > library(prettyR) > abc<-stretch_df(ab,idvar="rs",to.stretch=c("GENE","BETA")) > > Jiim > > On Mon, Dec 9, 2019 at 11:10 AM Ana Marija <sokovic.anamarija at gmail.com> wrote: > > > > Hello, > > > > I have two data frames: > > > > head(a) > > GENE rs BETA > > 1 ENSG00000154803 rs2605134 0.0360182 > > 2 ENSG00000154803 rs7405677 0.0525463 > > 3 ENSG00000154803 rs7211573 0.0525531 > > 4 ENSG00000154803 rs2746026 0.0466392 > > 5 ENSG00000141030 rs2605134 0.0806140 > > 6 ENSG00000141030 rs7405677 0.0251654 > > 7 ENSG00000141030 rs7211573 0.0252775 > > 8 ENSG00000141030 rs2746026 0.0976396 > > 9 ENSG00000205309 rs2605134 0.0838975 > > 10 ENSG00000205309 rs7405677 -0.2148500 > > 11 ENSG00000205309 rs7211573 -0.2148170 > > 12 ENSG00000205309 rs2746026 0.1013920 > > 13 ENSG00000215030 rs2605134 0.1261050 > > 14 ENSG00000215030 rs7405677 0.0165236 > > 15 ENSG00000215030 rs7211573 0.0163509 > > 16 ENSG00000215030 rs2746026 0.1201180 > > 17 ENSG00000141026 rs2605134 0.0485897 > > 18 ENSG00000141026 rs7405677 -0.0929964 > > 19 ENSG00000141026 rs7211573 -0.0930321 > > 20 ENSG00000141026 rs2746026 0.0623033 > > > > head(b) > > rs GWAS > > 1 rs2605134 0.0315177 > > 2 rs7405677 -0.0816389 > > 3 rs7211573 -0.0797796 > > 4 rs2746026 0.0199350 > > 5 rs11658521 0.0728377 > > 6 rs9914107 0.0720096 > > 7 rs56964223 0.0723903 > > > > Data frame a has: > > > length(unique(a$GENE)) > > [1] 51 > > > dim(a) > > [1] 287 3 > > > > and the whole data frame b is shown > > > > I would like to create a txt file which would have rs match for each > > ENSG from data frame b. If a particular ENSG does not have matching rs > > from data frame b the value under it would be zero. So the txt file > > would have 7 rows (for all those unique rs from data frame b) and 53 > > columns (for 51 ENSGs and one for unique rs and one for GWAS) > > > > So one row of that txt file would look like this. > > > > GENES ENSG00000154803 ENSG00000141030 ENSG00000205309 > > ENSG00000215030 ENSG00000141026 GWAS > > rs2605134 0.0360182 0.0806140 0.0838975 > > 0.1261050 0.0485897 0.0315177 > > ? > > > > Please advise, > > Ana > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.