Hi all,
I have some data as follows.
Cat1 Cat2 Cat3 COG Counts
A B C COG1 10
B D COG2 20
C COG3 30
D COG4 40
I would like to sum all the counts for each category:
A B C D
10 30 40 60
>CAT2COG<-
list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4"))
> COG2CAT<-
list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D")
> df<-
data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40))
I've been trying various version of apply and well as some crazy loops
(Eg. below).
Any help would be appreciated
Thanks,
Alison
> CATS<-names(CAT2COG)
> Catcounts<-rep(0,length(CATS))
> counter<-1
> for (i in CATS){
+ Catcounts[counter]<-CatCounts+df$counts[df[1,]=CAT2COG[i],]
Error: syntax error
> counter<-counter+1
> }
Gabor Grothendieck
2010-Oct-23 22:50 UTC
[R] Summarizing For Values with Multiple categories
On Sat, Oct 23, 2010 at 6:15 PM, Alison Waller <alison.waller at embl.de> wrote:> Hi all, > > I have some data as follows. > > Cat1 Cat2 Cat3 ?COG Counts > ? A ? ?B ? ?C COG1 ? ? 10 > ? B ? ?D ? ? ?COG2 ? ? 20 > ? C ? ? ? ? ? COG3 ? ? 30 > ? D ? ? ? ? ? COG4 ? ? 40 > > I would like to sum all the counts for each category: > A ? ? ? B ? ? ? C ? ? ? D > 10 ? ? ?30 ? ? ?40 ? ? ?60 > >>CAT2COG<-list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4")) >> COG2CAT<-list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D") >> df<-data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40)) >Try this:> aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum)ind counts 1 A 10 2 B 30 3 C 40 4 D 60 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com