thr3ads.net - R help - [R] Counting occurances of a letter by a factor [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Davis, Brian

2010-Sep-10 19:40 UTC

[R] Counting occurances of a letter by a factor

I'm trying to find a more elegant way of doing this.  What I'm trying to
accomplish is to count the frequency of letters (major / minor alleles)  in  a
string grouped by the factor levels in another column of my data frame.

Ex.> DF<-data.frame(c("CC", "CC", NA, "CG",
"GG", "GC"), c("L", "U", "L",
"U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF     X    Y
1   CC    L
2   CC    U
3 <NA>    L
4   CG    U
5   GG    L
6   GC <NA>

I have an ugly solution, which works if you know the factor levels of Y in
advance.
> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] ==
'L', 1]), ""))),+ table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]),
""))))> rownames(ans)<-c("L", "U")
> ans  C G
L 2 2
U 3 1


I've played with table, xtab, tabulate, aggregate, tapply, etc but
haven't found a combination that gives a more general solution to this
problem.

Any ideas?

Brian

Peng, C

2010-Sep-10 19:47 UTC

head link

[R] Counting occurances of a letter by a factor

try:

?ftable
-- 
View this message in context:
http://r.789695.n4.nabble.com/Counting-occurances-of-a-letter-by-a-factor-tp2534993p2535002.html
Sent from the R help mailing list archive at Nabble.com.

Darin A. England

2010-Sep-10 20:11 UTC

head link

[R] Counting occurances of a letter by a factor

I fiddled around and found this solution, which is far from elegant,
but it doesn't require you to know the factor levels in advance.

t <- with(DF, tapply(as.character(X), Y, table)) 
lapply(t, function(x) 
    table(strsplit(paste(names(x),collapse=""),split="")))

Darin


On Fri, Sep 10, 2010 at 02:40:50PM -0500, Davis, Brian
wrote:> I'm trying to find a more elegant way of doing this.  What I'm
trying to accomplish is to count the frequency of letters (major / minor
alleles)  in  a string grouped by the factor levels in another column of my data
frame.
> 
> Ex.
> > DF<-data.frame(c("CC", "CC", NA,
"CG", "GG", "GC"), c("L", "U",
"L", "U", "L", NA))
> > colnames(DF)<-c("X", "Y")
> > DF
>      X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
> 
> I have an ugly solution, which works if you know the factor levels of Y in
advance.
> 
> > ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y']
== 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U',
1]), ""))))
> > rownames(ans)<-c("L", "U")
> > ans
>   C G
> L 2 2
> U 3 1
> 
> 
> I've played with table, xtab, tabulate, aggregate, tapply, etc but
haven't found a combination that gives a more general solution to this
problem.
> 
> Any ideas?
> 
> Brian
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Davis, Brian

2010-Sep-10 20:11 UTC

head link

[R] Counting occurances of a letter by a factor

I'm my quest for brevity I think I scarified too much clarity.

I'll try to be a little less brief in the hopes of being more clear.

Say I have data frame like this as before:> DF<-data.frame(c("CC", "CC", NA, "CG",
"GG", "GC"), c("L", "U", "L",
"U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF     X    Y
1   CC    L
2   CC    U
3 <NA>    L
4   CG    U
5   GG    L
6   GC <NA>

I need to count the frequency of the unique individual characters in DF$X at
each factor level in DF$Y

So for DF$Y == "L"  there are 2 "C"'s and 2
"G"'s
and for DF$Y == "U" there are 3 "C"'s and 1
"G"

The NA's should not contribute to the counts.

If I had a individual character in DF$X instead of a string like:
> DF2<-data.frame(c("C", "C", NA, "C",
"G", "G"), c("L", "U", "L",
"U", "L", NA))
> colnames(DF2)<-c("X", "Y")
> DF2     X    Y
1    C    L
2    C    U
3 <NA>    L
4    C    U
5    G    L
6    G <NA>

Then table gives me exactly what I need. 
> table(DF2)   Y
X   L U
  C 1 2
  G 1 0



Hopefully this is a little bit clearer what I'm trying to accomplish.

Brian

-----Original Message-----
From: Phil Spector [mailto:spector at stat.berkeley.edu] 
Sent: Friday, September 10, 2010 2:52 PM
To: Davis, Brian
Subject: Re: [R] Counting occurances of a letter by a factor

Brian -
    Here's the only thing I can come up with to give the 
same result as your "ans", but it doesn't seem to correspond
with your description of the problem.
> DF1 = DF
> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
> DF2 = DF
> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
> newDF = rbind(DF1,DF2)
> table(newDF$Y,newDF$X)
     C G
   L 2 2
   U 3 1

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Fri, 10 Sep 2010, Davis, Brian wrote:
> I'm trying to find a more elegant way of doing this.  What I'm
trying to accomplish is to count the frequency of letters (major / minor
alleles)  in  a string grouped by the factor levels in another column of my data
frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG",
"GG", "GC"), c("L", "U", "L",
"U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>     X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in
advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y']
== 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U',
1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
>  C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but
haven't found a combination that gives a more general solution to this
problem.
>
> Any ideas?
>
> Brian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Brian Diggs

2010-Sep-10 20:19 UTC

head link

[R] Counting occurances of a letter by a factor

On 9/10/2010 12:40 PM, Davis, Brian wrote:> I'm trying to find a more elegant way of doing this.  What I'm
trying
> to accomplish is to count the frequency of letters (major / minor
> alleles) in a string grouped by the factor levels in another column
> of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG",
"GG", "GC"), c("L", "U", "L",
"U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>       X    Y
> 1   CC    L
> 2   CC    U
> 3<NA>     L
> 4   CG    U
> 5   GG    L
> 6   GC<NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in
advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y']
== 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U',
1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
>    C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but
> haven't found a combination that gives a more general solution to
> this problem.
>
> Any ideas?
>
> Brian
You are almost there.  The "plyr" package gets you the rest of the
way.
  You already have something that will, for a group of cases with the 
same "Y" value, tabulate the "X" values the way you want. 
ddply will
split the dataframe up by "Y" values and run that on each part.

library("plyr")

tab <- ddply(DF, .(Y),
function(x) {table(unlist(strsplit(as.character(x$X),"")))})
tab

#     Y C G
#1    L 2 2
#2    U 3 1
#3 <NA> 1 1

It is almost what you asked for.  If you really want it as a matrix with 
named rows:

tab2 <- as.matrix(tab[,-1])
rownames(tab2) <- tab[,1]

It still has an entry for the NA value of "Y", but that can be
filtered
as whatever step you like.

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University

Reasonably Related Threads

Search for more seemingly similar threads

R help - Sep 2010 - Counting occurances of a letter by a factor

[R] Counting occurances of a letter by a factor

[R] Counting occurances of a letter by a factor

[R] Counting occurances of a letter by a factor

[R] Counting occurances of a letter by a factor

[R] Counting occurances of a letter by a factor

Reasonably Related Threads