Hi all,
I have a large table mapping thousands of COGs(groups of genes) to
pathways.
# Ex
COG0001 patha pathb pathc
COG0002 pathd pathe
COG0003 pathe pathf pathg pathh
##
I would like to combine this information into a big list such as below
COG2PATHWAY<-
list
(COG0001
=
c
("patha
","pathb
","pathc
"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh"))
I am stuck and have tried various methods involving (probably mangled)
versions of lappy and loops.
Any suggestions on the most efficient way to do this would be great.
Thanks,
Alison
Here is my latest attempt.
#####
line_num<-length(scan(file="/g/bork8/waller/
test_COGtoPath.txt",what="character",sep="\n"))
COG2Path<-vector("list",line_num)
COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/waller/
test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))
#####
I am getting an error
#####
>COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/
waller/
test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))
Error in file(file, "r") : cannot open the connection
In addition: Warning message:
In file(file, "r") :
But if I do scan alone I don't get an error
# then I suppose it looks like the easiest wasy to name the list
variables is using unix to cut the first column out and then read that
in.
names(COG2Path)<-scan(file="/g/bork8/waller/
test_col_names.txt",sep="\t",what="character")
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller <alison.waller at embl.de> wrote:> Hi all, > > I have a large table mapping thousands of COGs(groups of genes) to pathways. > # Ex > COG0001 patha ? pathb ? pathc > COG0002 pathd ? pathe > COG0003 pathe ? pathf ? pathg ? pathh > ## > > I would like to combine this information into a big list such as below > COG2PATHWAY<-list(COG0001=c("patha","pathb","pathc"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh")) > > I am stuck and have tried various methods involving (probably mangled) > versions of lappy and loops. > > Any suggestions on the most efficient way to do this would be great. >Try this: Lines <- "COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh" DF <- read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = "") library(reshape2) m <- na.omit(melt(DF, 1)) result <- unstack(m, value ~ V1) giving> result$COG0001 [1] "patha" "pathb" "pathc" $COG0002 [1] "pathd" "pathe" $COG0003 [1] "pathe" "pathf" "pathg" "pathh" or> acast(DF, value ~ V1)COG0001 COG0002 COG0003 patha patha <NA> <NA> pathb pathb <NA> <NA> pathc pathc <NA> <NA> pathd <NA> pathd <NA> pathe <NA> pathe pathe pathf <NA> <NA> pathf pathg <NA> <NA> pathg pathh <NA> <NA> pathh Levels: patha pathb pathc pathd pathe pathf pathg pathh -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
To get just the list you wanted, Gabor's solution is more elegant, but
here's another using the apply family. First, your data:
dat <-
scan(file="/g/bork8/waller/test_COGtoPath.txt",what="character",sep="\n")
I expect dat to be a vector of strings where each string is a line of
values separated by tabs, which I think, by looking at your other
code, is what you get.
sapply(dat, function(x){
tmp<-unlist(strsplit(x, '\t', fixed=T))
out <- list(tmp[seq_along(tmp)[-1]])
names(out) <- tmp[1]
out
}, USE.NAMES=F)
The one difference between the two is that if you have a COG with no
pathways (might not be realistic or that big of a deal), this solution
will have the COG name in the list with a value of character(0) where
Gabor's will omit the COG completely. Again, probably not a big deal.
Cheers,
Jeff.
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller <alison.waller at embl.de>
wrote:> Hi all,
>
> I have a large table mapping thousands of COGs(groups of genes) to
pathways.
> # Ex
> COG0001 patha ? pathb ? pathc
> COG0002 pathd ? pathe
> COG0003 pathe ? pathf ? pathg ? pathh
> ##
>
> I would like to combine this information into a big list such as below
>
COG2PATHWAY<-list(COG0001=c("patha","pathb","pathc"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh"))
>
> I am stuck and have tried various methods involving (probably mangled)
> versions of lappy and loops.
>
> Any suggestions on the most efficient way to do this would be great.
>
> Thanks,
>
> Alison
>
> Here is my latest attempt.
>
> #####
>
>
line_num<-length(scan(file="/g/bork8/waller/test_COGtoPath.txt",what="character",sep="\n"))
> COG2Path<-vector("list",line_num)
> COG2Path<-lapply(1:(line_num-1),function(x)
>
scan(file="/g/bork8/waller/test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))
>
> #####
>
> I am getting an error
>
> #####
>
>>COG2Path<-lapply(1:(line_num-1),function(x)
>>
scan(file="/g/bork8/waller/test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t"))
> Error in file(file, "r") : cannot open the connection
> In addition: Warning message:
> In file(file, "r") :
>
> But if I do scan alone I don't get an error
>
> # then I suppose it looks like the easiest wasy to name the list variables
> is using unix to cut the first column out and then read that in.
>
names(COG2Path)<-scan(file="/g/bork8/waller/test_col_names.txt",sep="\t",what="character")
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>