Hi all, I have some data as follows. Cat1 Cat2 Cat3 COG Counts A B C COG1 10 B D COG2 20 C COG3 30 D COG4 40 I would like to sum all the counts for each category: A B C D 10 30 40 60 >CAT2COG<- list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4")) > COG2CAT<- list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D") > df<- data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40)) I've been trying various version of apply and well as some crazy loops (Eg. below). Any help would be appreciated Thanks, Alison > CATS<-names(CAT2COG) > Catcounts<-rep(0,length(CATS)) > counter<-1 > for (i in CATS){ + Catcounts[counter]<-CatCounts+df$counts[df[1,]=CAT2COG[i],] Error: syntax error > counter<-counter+1 > }
Gabor Grothendieck
2010-Oct-23 22:50 UTC
[R] Summarizing For Values with Multiple categories
On Sat, Oct 23, 2010 at 6:15 PM, Alison Waller <alison.waller at embl.de> wrote:> Hi all, > > I have some data as follows. > > Cat1 Cat2 Cat3 ?COG Counts > ? A ? ?B ? ?C COG1 ? ? 10 > ? B ? ?D ? ? ?COG2 ? ? 20 > ? C ? ? ? ? ? COG3 ? ? 30 > ? D ? ? ? ? ? COG4 ? ? 40 > > I would like to sum all the counts for each category: > A ? ? ? B ? ? ? C ? ? ? D > 10 ? ? ?30 ? ? ?40 ? ? ?60 > >>CAT2COG<-list(A="COG1",B=c("COG1","COG2"),C=c("COG1","COG3"),D=c("COG2","COG4")) >> COG2CAT<-list(COG1=c("A","B","C"),COG2=c("B","D"),COG3=c("C"),COG4="D") >> df<-data.frame(COGs=c("COG1","COG2","COG3","COG4"),counts=c(10,20,30,40)) >Try this:> aggregate(counts ~ ind, merge(stack(CAT2COG), df, by = 1), sum)ind counts 1 A 10 2 B 30 3 C 40 4 D 60 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com