Hi all, I have a large table mapping thousands of COGs(groups of genes) to pathways. # Ex COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh ## I would like to combine this information into a big list such as below COG2PATHWAY<- list (COG0001 = c ("patha ","pathb ","pathc "),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh")) I am stuck and have tried various methods involving (probably mangled) versions of lappy and loops. Any suggestions on the most efficient way to do this would be great. Thanks, Alison Here is my latest attempt. ##### line_num<-length(scan(file="/g/bork8/waller/ test_COGtoPath.txt",what="character",sep="\n")) COG2Path<-vector("list",line_num) COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/waller/ test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) ##### I am getting an error ##### >COG2Path<-lapply(1:(line_num-1),function(x) scan(file="/g/bork8/ waller/ test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) Error in file(file, "r") : cannot open the connection In addition: Warning message: In file(file, "r") : But if I do scan alone I don't get an error # then I suppose it looks like the easiest wasy to name the list variables is using unix to cut the first column out and then read that in. names(COG2Path)<-scan(file="/g/bork8/waller/ test_col_names.txt",sep="\t",what="character")
On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller <alison.waller at embl.de> wrote:> Hi all, > > I have a large table mapping thousands of COGs(groups of genes) to pathways. > # Ex > COG0001 patha ? pathb ? pathc > COG0002 pathd ? pathe > COG0003 pathe ? pathf ? pathg ? pathh > ## > > I would like to combine this information into a big list such as below > COG2PATHWAY<-list(COG0001=c("patha","pathb","pathc"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh")) > > I am stuck and have tried various methods involving (probably mangled) > versions of lappy and loops. > > Any suggestions on the most efficient way to do this would be great. >Try this: Lines <- "COG0001 patha pathb pathc COG0002 pathd pathe COG0003 pathe pathf pathg pathh" DF <- read.table(textConnection(Lines), header = FALSE, fill = TRUE, as.is = TRUE, na.strings = "") library(reshape2) m <- na.omit(melt(DF, 1)) result <- unstack(m, value ~ V1) giving> result$COG0001 [1] "patha" "pathb" "pathc" $COG0002 [1] "pathd" "pathe" $COG0003 [1] "pathe" "pathf" "pathg" "pathh" or> acast(DF, value ~ V1)COG0001 COG0002 COG0003 patha patha <NA> <NA> pathb pathb <NA> <NA> pathc pathc <NA> <NA> pathd <NA> pathd <NA> pathe <NA> pathe pathe pathf <NA> <NA> pathf pathg <NA> <NA> pathg pathh <NA> <NA> pathh Levels: patha pathb pathc pathd pathe pathf pathg pathh -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
To get just the list you wanted, Gabor's solution is more elegant, but here's another using the apply family. First, your data: dat <- scan(file="/g/bork8/waller/test_COGtoPath.txt",what="character",sep="\n") I expect dat to be a vector of strings where each string is a line of values separated by tabs, which I think, by looking at your other code, is what you get. sapply(dat, function(x){ tmp<-unlist(strsplit(x, '\t', fixed=T)) out <- list(tmp[seq_along(tmp)[-1]]) names(out) <- tmp[1] out }, USE.NAMES=F) The one difference between the two is that if you have a COG with no pathways (might not be realistic or that big of a deal), this solution will have the COG name in the list with a value of character(0) where Gabor's will omit the COG completely. Again, probably not a big deal. Cheers, Jeff. On Sun, Oct 10, 2010 at 11:40 AM, Alison Waller <alison.waller at embl.de> wrote:> Hi all, > > I have a large table mapping thousands of COGs(groups of genes) to pathways. > # Ex > COG0001 patha ? pathb ? pathc > COG0002 pathd ? pathe > COG0003 pathe ? pathf ? pathg ? pathh > ## > > I would like to combine this information into a big list such as below > COG2PATHWAY<-list(COG0001=c("patha","pathb","pathc"),COG0002=c("pathd","pathe"),COG0003=c("pathf","pathg","pathh")) > > I am stuck and have tried various methods involving (probably mangled) > versions of lappy and loops. > > Any suggestions on the most efficient way to do this would be great. > > Thanks, > > Alison > > Here is my latest attempt. > > ##### > > line_num<-length(scan(file="/g/bork8/waller/test_COGtoPath.txt",what="character",sep="\n")) > COG2Path<-vector("list",line_num) > COG2Path<-lapply(1:(line_num-1),function(x) > scan(file="/g/bork8/waller/test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) > > ##### > > I am getting an error > > ##### > >>COG2Path<-lapply(1:(line_num-1),function(x) >> scan(file="/g/bork8/waller/test_COGtopath.txt",skip=x,nlines=1,quiet=T,what='character',sep="\t")) > Error in file(file, "r") : cannot open the connection > In addition: Warning message: > In file(file, "r") : > > But if I do scan alone I don't get an error > > # then I suppose it looks like the easiest wasy to name the list variables > is using unix to cut the first column out and then read that in. > names(COG2Path)<-scan(file="/g/bork8/waller/test_col_names.txt",sep="\t",what="character") > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >