I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame. Ex.> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA)) > colnames(DF)<-c("X", "Y") > DFX Y 1 CC L 2 CC U 3 <NA> L 4 CG U 5 GG L 6 GC <NA> I have an ugly solution, which works if you know the factor levels of Y in advance.> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),+ table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), ""))))> rownames(ans)<-c("L", "U") > ansC G L 2 2 U 3 1 I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem. Any ideas? Brian
try: ?ftable -- View this message in context: http://r.789695.n4.nabble.com/Counting-occurances-of-a-letter-by-a-factor-tp2534993p2535002.html Sent from the R help mailing list archive at Nabble.com.
I fiddled around and found this solution, which is far from elegant, but it doesn't require you to know the factor levels in advance. t <- with(DF, tapply(as.character(X), Y, table)) lapply(t, function(x) table(strsplit(paste(names(x),collapse=""),split=""))) Darin On Fri, Sep 10, 2010 at 02:40:50PM -0500, Davis, Brian wrote:> I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame. > > Ex. > > DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA)) > > colnames(DF)<-c("X", "Y") > > DF > X Y > 1 CC L > 2 CC U > 3 <NA> L > 4 CG U > 5 GG L > 6 GC <NA> > > I have an ugly solution, which works if you know the factor levels of Y in advance. > > > ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))), > + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), "")))) > > rownames(ans)<-c("L", "U") > > ans > C G > L 2 2 > U 3 1 > > > I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem. > > Any ideas? > > Brian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I'm my quest for brevity I think I scarified too much clarity. I'll try to be a little less brief in the hopes of being more clear. Say I have data frame like this as before:> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA)) > colnames(DF)<-c("X", "Y") > DFX Y 1 CC L 2 CC U 3 <NA> L 4 CG U 5 GG L 6 GC <NA> I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y So for DF$Y == "L" there are 2 "C"'s and 2 "G"'s and for DF$Y == "U" there are 3 "C"'s and 1 "G" The NA's should not contribute to the counts. If I had a individual character in DF$X instead of a string like:> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA)) > colnames(DF2)<-c("X", "Y") > DF2X Y 1 C L 2 C U 3 <NA> L 4 C U 5 G L 6 G <NA> Then table gives me exactly what I need.> table(DF2)Y X L U C 1 2 G 1 0 Hopefully this is a little bit clearer what I'm trying to accomplish. Brian -----Original Message----- From: Phil Spector [mailto:spector at stat.berkeley.edu] Sent: Friday, September 10, 2010 2:52 PM To: Davis, Brian Subject: Re: [R] Counting occurances of a letter by a factor Brian - Here's the only thing I can come up with to give the same result as your "ans", but it doesn't seem to correspond with your description of the problem.> DF1 = DF > DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1) > DF2 = DF > DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2) > newDF = rbind(DF1,DF2) > table(newDF$Y,newDF$X)C G L 2 2 U 3 1 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 10 Sep 2010, Davis, Brian wrote:> I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame. > > Ex. >> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA)) >> colnames(DF)<-c("X", "Y") >> DF > X Y > 1 CC L > 2 CC U > 3 <NA> L > 4 CG U > 5 GG L > 6 GC <NA> > > I have an ugly solution, which works if you know the factor levels of Y in advance. > >> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))), > + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), "")))) >> rownames(ans)<-c("L", "U") >> ans > C G > L 2 2 > U 3 1 > > > I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem. > > Any ideas? > > Brian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On 9/10/2010 12:40 PM, Davis, Brian wrote:> I'm trying to find a more elegant way of doing this. What I'm trying > to accomplish is to count the frequency of letters (major / minor > alleles) in a string grouped by the factor levels in another column > of my data frame. > > Ex. >> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA)) >> colnames(DF)<-c("X", "Y") >> DF > X Y > 1 CC L > 2 CC U > 3<NA> L > 4 CG U > 5 GG L > 6 GC<NA> > > I have an ugly solution, which works if you know the factor levels of Y in advance. > >> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))), > + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), "")))) >> rownames(ans)<-c("L", "U") >> ans > C G > L 2 2 > U 3 1 > > > I've played with table, xtab, tabulate, aggregate, tapply, etc but > haven't found a combination that gives a more general solution to > this problem. > > Any ideas? > > BrianYou are almost there. The "plyr" package gets you the rest of the way. You already have something that will, for a group of cases with the same "Y" value, tabulate the "X" values the way you want. ddply will split the dataframe up by "Y" values and run that on each part. library("plyr") tab <- ddply(DF, .(Y), function(x) {table(unlist(strsplit(as.character(x$X),"")))}) tab # Y C G #1 L 2 2 #2 U 3 1 #3 <NA> 1 1 It is almost what you asked for. If you really want it as a matrix with named rows: tab2 <- as.matrix(tab[,-1]) rownames(tab2) <- tab[,1] It still has an entry for the NA value of "Y", but that can be filtered as whatever step you like. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University