HI,
"
btw, could this program be modified such that it take direct input from my
input.txt file???"
?Lines1<-readLines("utpalmtbl.txt")
Lines1
#[1] "OG1: or10|1345 or10|387 or10|474 or11|1203 or11|182 or10|2158
or12|637"
#[2] "OG2: or10|1562 or10|1584 or10|1977 or11|2263
or11|43"?????????????????
#[3] "OG3: or12|2400 or12|2401 or13|2697 or13|2698 or16|2 or16|914
or27|1355"
#[4] "OG4: or10|108 or20|2713 or25|2315 or25|2754 or2|1411"???????
library(stringr)
res<-paste(gsub("(.*\\:).*","\\1",Lines1),unlist(lapply(str_match_all(Lines1,"or10\\|\\d+"),paste,collapse="
")),sep=" ")
res
#[1] "OG1: or10|1345 or10|387 or10|474 or10|2158"
#[2] "OG2: or10|1562 or10|1584 or10|1977"???????
#[3] "OG3: "????????????????????????????????????
#[4] "OG4: or10|108"??????
??
write.table(res,"res.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
Lines2<- readLines("res.txt")
?Lines2
#[1] "OG1: or10|1345 or10|387 or10|474 or10|2158"
#[2] "OG2: or10|1562 or10|1584 or10|1977"???????
#[3] "OG3: "????????????????????????????????????
#[4] "OG4: or10|108"?
I hope this helps.
A.K. ?????
Hi Utpal,
You can use the same script from my previous email.?
?Lines1<- readLines("groups.txt")
library(stringr)
res<-paste(gsub("(.*\\:).*","\\1",Lines1),unlist(lapply(str_match_all(Lines1,"or10\\|\\d+"),paste,collapse="
")),sep=" ")
write.table(res,"res1.txt",row.names=FALSE,col.names=FALSE,quote=FALSE)
?Lines2<- readLines("res1.txt")
length(Lines2)
#[1] 4633
?head(Lines2)
#[1] "OG1: or10|1345 or10|387 or10|474"?????????????????????
#[2] "OG2: or10|1562 or10|1584 or10|1977"???????????????????
#[3] "OG3: or10|1636 or10|1990 or10|2257 or10|2258 or10|2499"
#[4] "OG4: or10|600"????????????????????????????????????????
#[5] "OG5: or10|1053 or10|2869"?????????????????????????????
#[6] "OG6: or10|2798 or10|568"??
A.K.
________________________________
From: Utpal Bakshi <utpalb4u22 at gmail.com>
To: smartpink111 <smartpink111 at yahoo.com>
Sent: Friday, April 5, 2013 1:36 PM
Subject:
Offcourse...
The groups.txt file is my input file..
I also attached the core genome program.. the scripts in myscript file uses the
function written by Andreas Sjodin that is in coregenome.R file..