thr3ads.net - R help - [R] Pairwise comparison between columns, logic [Jul 2013]

If this information is useful, please help other people find it:
Share via:
arun
2013-Jul-25 23:16 UTC
[R] Pairwise comparison between columns, logic

HI,
Not sure about what your expected output would be.? Also 'CEBPA' was not
present in the Data.txt.

gset<- read.table("Names.txt",header=TRUE,stringsAsFactors=FALSE)
?temp1<- read.table("Data.txt",header=TRUE,stringsAsFactors=FALSE)
lst1<-split(temp1,temp1$Names)
mat1<-combn(gset[-1,1],2) #removed CEBPA
library(plyr)

lst2<-lapply(split(mat1,col(mat1)),function(x)
{x1<-join_all(lst1[x],by="patient_id",type="inner");x1["patient_id"]
})
names(lst2)<-apply(mat1,2,paste,collapse="_")
do.call(rbind,lst2)
#?????????????????? patient_id
#DNMT3A_FLT3.1 LAML-AB-2811-TB #common ids between DNMT3A and FLT3
#DNMT3A_FLT3.2 LAML-AB-2816-TB
#DNMT3A_FLT3.3 LAML-AB-2818-TB
#DNMT3A_IDH1.1 LAML-AB-2802-TB#common ids between DNMT3A and IDH1.? If you
wanted it as separate dataframes, use `lst2`.
#DNMT3A_IDH1.2 LAML-AB-2822-TB
#DNMT3A_NPM1.1 LAML-AB-2802-TB
#DNMT3A_NPM1.2 LAML-AB-2809-TB
#DNMT3A_NPM1.3 LAML-AB-2811-TB
#DNMT3A_NPM1.4 LAML-AB-2816-TB
#DNMT3A_NRAS?? LAML-AB-2816-TB
#FLT3_NPM1.1?? LAML-AB-2811-TB
#FLT3_NPM1.2?? LAML-AB-2812-TB
#FLT3_NPM1.3?? LAML-AB-2816-TB
#FLT3_NRAS???? LAML-AB-2816-TB
#IDH1_NPM1???? LAML-AB-2802-TB
#NPM1_NRAS???? LAML-AB-2816-TB
A.K.



Hello R experts, 

I am trying to solve the following logic. 
I have two input files. The first file (Names.txt) that has two columns: 
Column1	Column2 
CEBPA	CEBPA 
DNMT3A	DNMT3A 
FLT3	FLT3 
IDH1	IDH1 
NPM1	NPM1 
NRAS	NRAS 
and the second input file Data.txt has two columns Names, patient_id. 
Name	patient_id 
DNMT3A	LAML-AB-2802-TB 
DNMT3A	LAML-AB-2809-TB 
DNMT3A	LAML-AB-2811-TB 
DNMT3A	LAML-AB-2816-TB 
DNMT3A	LAML-AB-2818-TB 
DNMT3A	LAML-AB-2822-TB 
DNMT3A	LAML-AB-2824-TB 
FLT3	LAML-AB-2811-TB 
FLT3	LAML-AB-2812-TB 
FLT3	LAML-AB-2814-TB 
FLT3	LAML-AB-2816-TB 
FLT3	LAML-AB-2818-TB 
FLT3	LAML-AB-2825-TB 
FLT3	LAML-AB-2830-TB 
FLT3	LAML-AB-2834-TB 
IDH1	LAML-AB-2802-TB 
IDH1	LAML-AB-2821-TB 

?What I am attempting to do is for each name in first column of 
names.txt, I do a pairwise comparison with the other names in the second
 column based on which patient ids are common. 
To explain in detail: 
As an example: I extract patient_ids for CEBPA and DNMT3A and see 
which are common, then I do the same for CEBPA and FLT3 and so on for 
CEBPA and the next name in column 2. 
So far the script I have written only does the comparison with the 
first name in the list. So essentially with itself. I am not sure why 
this logic is not working for all the names in column 2 for a single 
name in column 1. 

Below is my script: 

gset<-read.table("Names.txt",header=F,na.strings = ".",
as.is=T) # reading in the genes
temp<-read.table("Data.txt",header=T,sep="\t") 


################################################# 
? 
? all<-length(unique(temp$fpatient_id)) 
? final<-c() 
? 
? both.ab <- list() 
? both <- list() 
? temp.b <- matrix() 
? 
? for(i in 1:nrow(gset)) ?# Loop for genes in the first column 
? 
? { 
? ? 
? ? temp2<-temp[which(temp$Column1 %in% gset[i,]),] 
? ? num.mut<-length(unique(temp2$patient_id)) 
? ? 
? ? temp.a <-temp[which(temp$Column1 == gset[i,1]),] 
? 
? ? for(j in 1:(nrow(gset)) ?# Loop for genes in the second column 
? ? ? ? ? ? 
? ? { 
? ? ? temp.b <-temp[which(temp$Column2 == gset[j,2]),] 
? ? ? # See which patient_ids of temp.a are in temp.b 
? ? ? both.ab[[i]]<-temp.a[which(temp.a$patient_id %in% temp.b$patient_id),] 
? ? } 

? ? both[[i]]<-both.ab[[i]] 
? ? 
? ? num.both<-length(unique(both[[i]]$patient_id)) 
? ? 
? ? line<-c(paste(gset[i, which(!(is.na(gset[i,])))
],collapse="/"), num.mut, all, num.mut/all, num.both)
? ? final<-rbind(final,line) 
? } 
Names.txtData.txtScript.txt
R help - Jul 2013 - Pairwise comparison between columns, logic

[R] Pairwise comparison between columns, logic