Hello, I've been running several programs in the unix shell, and it's time to combine results from several different pipelines. I've been writing shell scripts with heavy use of awk and grep to make big text files, but I'm thinking it would be better to have all my data in one big structure in R so that I can query whatever attributes I like, and print several corresponding tables to separate files. I haven't used R in years, so I was hoping somebody might be able to suggest a solution or combinatin of functions that could help me get oriented.. Right now, I can import my data into a data frame that looks like this: df <- data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))> dfcase gene issue 1 case_1 gene1 nsyn 2 case_1 gene1 amp 3 case_2 gene1 del 4 case_3 gene2 UTR I'd like to cook up some combination of functions/scripting that can convert a table like df to produce a list or a data frame/ matrix that looks like df2:> df2case_1 case_2 case_3 gene1 nsyn,amp del 0 gene2 0 0 UTR I can build df2 manually, like this: df2 <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR")) rownames(df2)<-c("gene1","gene2") but obviously do not want to do this by hand; I want R to generate df2 from df. Any pointers/ideas would be most welcome! Thanks, Jonathan [[alternative HTML version deleted]]
On Oct 23, 2013, at 4:36 PM, Jon BR wrote:> Hello, > I've been running several programs in the unix shell, and it's time to > combine results from several different pipelines. I've been writing shell > scripts with heavy use of awk and grep to make big text files, but I'm > thinking it would be better to have all my data in one big structure in R > so that I can query whatever attributes I like, and print several > corresponding tables to separate files. > > I haven't used R in years, so I was hoping somebody might be able to > suggest a solution or combinatin of functions that could help me get > oriented.. > > Right now, I can import my data into a data frame that looks like this: > > df <- > data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR")) >> df > case gene issue > 1 case_1 gene1 nsyn > 2 case_1 gene1 amp > 3 case_2 gene1 del > 4 case_3 gene2 UTR > > > I'd like to cook up some combination of functions/scripting that can > convert a table like df to produce a list or a data frame/ matrix that > looks like df2: > >> df2 > case_1 case_2 case_3 > gene1 nsyn,amp del 0 > gene2 0 0 UTR > > I can build df2 manually, like this: > df2 > <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR")) > rownames(df2)<-c("gene1","gene2")Factors will be a hassle: df <- data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE) df with( df, matrix( tapply(issue, list(gene, case), list) , nrow=length(unique(gene)),ncol=length(unique(case)) ) ) [,1] [,2] [,3] [1,] Character,2 "del" NA [2,] NA NA "UTR"> dmat[1,1][[1]] [1] "nsyn" "amp"> as.data.frame(dmat)V1 V2 V3 1 nsyn, amp del NA 2 NA NA UTR> > but obviously do not want to do this by hand; I want R to generate df2 from > df. > > Any pointers/ideas would be most welcome! > > Thanks, > Jonathan > > [[alternative HTML version deleted]]R is a plain text mailing list. Old school, admittedly, but much better for coding questions. Surely an awk user can appreciate the wisdom of that request? -- David Winsemius Alameda, CA, USA
HI, You may try: library(reshape2) df <- data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE) res <- dcast(df,gene~case,value.var="issue",list) ?res #?? gene??? case_1 case_2 case_3 #1 gene1 nsyn, amp??? del?????? #2 gene2???????????????????? UTR A.K. On Wednesday, October 23, 2013 7:38 PM, Jon BR <jonsleepy at gmail.com> wrote: Hello, ? ? I've been running several programs in the unix shell, and it's time to combine results from several different pipelines.? I've been writing shell scripts with heavy use of awk and grep to make big text files, but I'm thinking it would be better to have all my data in one big structure in R so that I can query whatever attributes I like, and print several corresponding tables to separate files. I haven't used R in years, so I was hoping somebody might be able to suggest a solution or combinatin of functions that could help me get oriented.. Right now, I can import my data into a data frame that looks like this: df <- data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR"))> df? ? case? gene issue 1 case_1 gene1? nsyn 2 case_1 gene1? amp 3 case_2 gene1? del 4 case_3 gene2? UTR I'd like to cook up some combination of functions/scripting that can convert a table like df to produce a list or a data frame/ matrix that looks like df2:> df2? ? ? ? case_1 case_2 case_3 gene1 nsyn,amp? ? del? ? ? 0 gene2? ? ? ? 0? ? ? 0? ? UTR I can build df2 manually, like this: df2 <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR")) rownames(df2)<-c("gene1","gene2") but obviously do not want to do this by hand; I want R to generate df2 from df. Any pointers/ideas would be most welcome! Thanks, Jonathan ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.