Hi,
Using the example code without removing CEBPA:
gset<- read.table("Names.txt",header=TRUE,stringsAsFactors=FALSE)
?temp1<- read.table("Data.txt",header=TRUE,stringsAsFactors=FALSE)
lst1<-split(temp1,temp1$Names)
mat1<-combn(gset[,1],2)
library(plyr)
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<-
join_all(x,by="patient_id",type="inner");x2<-x1["patient_id"];row.names(x2)<-if(nrow(x1)!=0)
paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
?Reduce(rbind,lst3)
#?????????????????? patient_id
#DNMT3A_FLT3_1 LAML-AB-2811-TB
#DNMT3A_FLT3_2 LAML-AB-2816-TB
#DNMT3A_FLT3_3 LAML-AB-2818-TB
#DNMT3A_IDH1_1 LAML-AB-2802-TB
#DNMT3A_IDH1_2 LAML-AB-2822-TB
#DNMT3A_NPM1_1 LAML-AB-2802-TB
#DNMT3A_NPM1_2 LAML-AB-2809-TB
#DNMT3A_NPM1_3 LAML-AB-2811-TB
#DNMT3A_NPM1_4 LAML-AB-2816-TB
#DNMT3A_NRAS_1 LAML-AB-2816-TB
#FLT3_NPM1_1?? LAML-AB-2811-TB
#FLT3_NPM1_2?? LAML-AB-2812-TB
#FLT3_NPM1_3?? LAML-AB-2816-TB
#FLT3_NRAS_1?? LAML-AB-2816-TB
#IDH1_NPM1_1?? LAML-AB-2802-TB
#NPM1_NRAS_1?? LAML-AB-2816-TB
########From your original dataset:
gset<-
read.table("SampleGenes.txt",header=TRUE,stringsAsFactors=FALSE)
temp0<-
read.table("LAML-TB.final_analysis_set.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
?temp1<- temp0[,c("Hugo_Symbol","firehose_patient_id")]
?str(temp1)
#'data.frame':??? 2221 obs. of? 2 variables:
# $ Hugo_Symbol??????? : chr? "TBX15" "TCHHL1"
"DNMT3A" "IDH1" ...
# $ firehose_patient_id: chr? "LAML-AB-2802-TB"
"LAML-AB-2802-TB" "LAML-AB-2802-TB"
"LAML-AB-2802-TB" ...
lst1<-split(temp1,temp1$Hugo_Symbol)
?length(lst1)
#[1] 1607
mat1<-combn(gset[,1],2) # Generate all
lst2<-lapply(split(mat1,col(mat1)),function(x){lst1[x][all(lapply(lst1[x],length)==2)]})
?length(lst2)
#[1] 105
?lst3<-lapply(lst2[lapply(lst2,length)==2],function(x) {x1<-
join_all(x,by="firehose_patient_id",type="inner");x2<-x1["firehose_patient_id"];row.names(x2)<-if(nrow(x1)!=0)
paste(x1[,1],x1[,3],1:nrow(x1),sep="_") else NULL;x2 })
res<-Reduce(rbind,lst3)
?nrow(res)
#[1] 234
head(res)
#??????????? firehose_patient_id
#NPM1_FLT3_1???? LAML-AB-2811-TB
#NPM1_FLT3_2???? LAML-AB-2812-TB
#NPM1_FLT3_3???? LAML-AB-2816-TB
#NPM1_FLT3_4???? LAML-AB-2818-TB
#NPM1_FLT3_5???? LAML-AB-2825-TB
#NPM1_FLT3_6???? LAML-AB-2836-TB
Regarding your second question:
setdiff(gset[,1],unique(temp1[,1])) # CEBPA was not found in the temp1[,1]
#[1] "CEBPA"
mat2<- combn(gset[-5,1],2)
vec1<- apply(mat2,2,paste,collapse="_")
vec2<-unique(gsub("(.*\\_.*)\\_.*","\\1",row.names(res)))
setdiff(vec1,vec2)
?#[1] "NPM1_TP53"?? "NPM1_EZH2"?? "NPM1_RUNX1"?
"NPM1_ASXL1"? "NPM1_KDM6A"
?#[6] "FLT3_TP53"?? "FLT3_EZH2"?? "FLT3_KRAS"??
"FLT3_ASXL1"? "FLT3_KDM6A"
#[11] "IDH1_TP53"?? "IDH1_KRAS"?? "NRAS_IDH2"??
"NRAS_KRAS"?? "NRAS_ASXL1"
#[16] "NRAS_KDM6A"? "TP53_EZH2"?? "TP53_IDH2"??
"TP53_RUNX1"? "TP53_KRAS"?
#[21] "TP53_WT1"??? "TP53_ASXL1"? "TP53_KDM6A"?
"EZH2_IDH2"?? "EZH2_WT1"??
#[26] "EZH2_ASXL1"? "EZH2_KDM6A"? "IDH2_TET2"??
"IDH2_KDM6A"? "RUNX1_KDM6A"
#[31] "KRAS_WT1"??? "KRAS_KDM6A"? "WT1_ASXL1"??
"WT1_TET2"??? "WT1_KDM6A"?
#[36] "ASXL1_TET2"? "ASXL1_KDM6A" "TET2_KDM6A"
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Manisha <manishabh77 at gmail.com>
Cc: R help <r-help at r-project.org>
Sent: Friday, July 26, 2013 11:18 AM
Subject: Re: Pairwise comparison between columns, logic
Hi Manisha,
I didn't run your dataset as I am on the way to college.? But, from the
error reported, I think it will be due to some missing combinations in one of
the dataset.? For ex. if you run my previous code without removing CEBPA:
ie.
mat1<- combn(gset[,1],2)
lst2<-lapply(split(mat1,col(mat1)),function(x)
{x1<-join_all(lst1[x],by="patient_id",type="inner");x1["patient_id"]
} )
#Error: All inputs to rbind.fill must be data.frames
So, check whether all the combinations are available in the `lst1`.
2. I will get back to you once I run it.
A.K.
________________________________
From: Manisha <manishabh77 at gmail.com>
To: arun <smartpink111 at yahoo.com>
Sent: Friday, July 26, 2013 11:09 AM
Subject: Re: Pairwise comparison between columns, logic
Hi Arun,
I ran the script on a larger dataset and I seem to be running into this
following error:
Error: All inputs to rbind.fill must be data.frames
after the step;
lst2<-lapply(split(mat1,col(mat1)),function(x)
{x1<-join_all(lst1[x],by="firehose_patient_id",type="inner");x1["firehose_patient_id"]})
I tried a few things to solve the issue but I am not able to. The format of
input files and data are same as in the code you posted.
Could you suggest me something?
I have attached my input files on which I am trying to run the code. See
attached code as well. Minor changes have been made by me.
2. I have another question. From your code how do also capture those pairs of
names that donot have any common patient id?
Thanks again,
-M
On Fri, Jul 26, 2013 at 9:29 AM, arun <smartpink111 at yahoo.com> wrote:
Hi M,>No problem.
>Regards,
>Arun
>
>
>
>
>----- Original Message -----
>From: "manishabh77 at gmail.com" <manishabh77 at gmail.com>
>To: smartpink111 at yahoo.com
>Cc:
>Sent: Friday, July 26, 2013 9:27 AM
>Subject: Re: Pairwise comparison between columns, logic
>
>Thanks for the code. It is elegant and does what I need. Learnt some new
things.
>-M
>
>
>_____________________________________
>Sent from http://r.789695.n4.nabble.com
>??