Hello all, I'm new in R, and I have a data-frame like this (dput information below): Specie Fooditem Occurrence Volume 1 Schizodon vegetal 1 0.05 2 Schizodon sediment 1 0.60 3 Schizodon vegetal 1 0.15 4 Schizodon alga 1 0.05 5 Schizodon sediment 1 0.90 6 Schizodon sediment 1 0.30 7 Schizodon sediment 1 0.90 8 Astyanax terrestrial_insect 1 0.10 9 Astyanax vegetal 1 0.85 10 Astyanax aquatical_insect 1 0.05 11 Astyanax vegetal 1 0.90 12 Astyanax un_insect 1 0.85 for each specie, I have to calculate a food item importance index, that is: Fi x Vi / Sum (Fi x Vi) Fi = percentual frequency of occurrence of a food item Vi = percentual volume of a food item So, using ddply (plyr) function, I was able to calculate the total frequency of occurrence and total volume of each food item, using: Frequency = ddply (dieta, c('Specie','Fooditem') , summarise, Frequency = sum (Occurrence)) Volume = ddply (dieta, c('Specie','Fooditem') , summarise, Volume sum (Volume)) and calculate total frequency and total volume for a given specie: TFrequency = ddply (Frequency, 'Specie' , summarise, TF = sum (Frequency)) TVolume = ddply (dieta, c('Specie') , summarise, Volume = sum (Volume)) but once they have different length, I could not use together to create a percentage needed in my formula. Any suggestions? Thanks in advanced for help and attention, Raoni dput (diet) structure(list(Specie = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Astyanax", "Schizodon"), class = "factor"), Fooditem = structure(c(6L, 3L, 6L, 1L, 3L, 3L, 3L, 4L, 6L, 2L, 6L, 5L), .Label = c("alga", "aquatical_insect", "sediment", "terrestrial_insect", "un_insect", "vegetal"), class = "factor"), Occurrence = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Volume = c(0.05, 0.6, 0.15, 0.05, 0.9, 0.3, 0.9, 0.1, 0.85, 0.05, 0.9, 0.85)), .Names = c("Specie", "Fooditem", "Occurrence", "Volume"), class = "data.frame", row.names = c(NA, -12L)) sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) Windows XP -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.raoni at gmail.com
Hello, 1) Instead of computing TFrequency and TVolume like you have, try the following. TF <- with(Frequency, ave(Frequency, Specie, FUN = sum)) TV <- with(Volume, ave(Volume, Specie, FUN = sum)) Fi <- with(Frequency, Frequency/TF) Vi <- with(Volume, Volume/TV) Importance <- Fi*Vi/sum(Fi*Vi) 2) Using TFrequency and TVolume, you can solve the different nrows problem with merge() ?merge m1 <- merge(Frequency, Volume) m2 <- merge(m1, TFrequency) m3 <- merge(m2, TVolume, by = 'Specie') Fi <- with(m3, Frequency / TF) Vi <- with(m3, Volume.x / Volume.y) Importance <- Fi*Vi/sum(Fi*Vi) 3) Maybe you can combine both ways and find a use for the data.frame 'm1'. And have m1$Importance <- ...etc... Hope this helps, Rui Barradas Em 18-09-2012 05:48, Raoni Rodrigues escreveu:> Hello all, > > I'm new in R, and I have a data-frame like this (dput information below): > > Specie Fooditem Occurrence Volume > 1 Schizodon vegetal 1 0.05 > 2 Schizodon sediment 1 0.60 > 3 Schizodon vegetal 1 0.15 > 4 Schizodon alga 1 0.05 > 5 Schizodon sediment 1 0.90 > 6 Schizodon sediment 1 0.30 > 7 Schizodon sediment 1 0.90 > 8 Astyanax terrestrial_insect 1 0.10 > 9 Astyanax vegetal 1 0.85 > 10 Astyanax aquatical_insect 1 0.05 > 11 Astyanax vegetal 1 0.90 > 12 Astyanax un_insect 1 0.85 > > > for each specie, I have to calculate a food item importance index, that is: > > Fi x Vi / Sum (Fi x Vi) > > Fi = percentual frequency of occurrence of a food item > Vi = percentual volume of a food item > > So, using ddply (plyr) function, I was able to calculate the total > frequency of occurrence and total volume of each food item, using: > > Frequency = ddply (dieta, c('Specie','Fooditem') , summarise, > Frequency = sum (Occurrence)) > > Volume = ddply (dieta, c('Specie','Fooditem') , summarise, Volume > sum (Volume)) > > and calculate total frequency and total volume for a given specie: > > TFrequency = ddply (Frequency, 'Specie' , summarise, TF = sum (Frequency)) > > TVolume = ddply (dieta, c('Specie') , summarise, Volume = sum (Volume)) > > but once they have different length, I could not use together to > create a percentage needed in my formula. > > Any suggestions? > > Thanks in advanced for help and attention, > > Raoni > > dput (diet) > > structure(list(Specie = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, > 1L, 1L, 1L, 1L, 1L), .Label = c("Astyanax", "Schizodon"), class = "factor"), > Fooditem = structure(c(6L, 3L, 6L, 1L, 3L, 3L, 3L, 4L, 6L, > 2L, 6L, 5L), .Label = c("alga", "aquatical_insect", "sediment", > "terrestrial_insect", "un_insect", "vegetal"), class = "factor"), > Occurrence = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 1L), Volume = c(0.05, 0.6, 0.15, 0.05, 0.9, 0.3, 0.9, 0.1, > 0.85, 0.05, 0.9, 0.85)), .Names = c("Specie", "Fooditem", > "Occurrence", "Volume"), class = "data.frame", row.names = c(NA, > -12L)) > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: i386-pc-mingw32/i386 (32-bit) > Windows XP
Hi, May be this helps you. TF1<-aggregate(Frequency~Specie,data=Frequency,FUN=sum) TV1<-aggregate(Volume~Specie,data=Volume,FUN=sum) new1<-merge(TF1,Frequency, by="Specie") new2<-merge(TV1,Volume,by="Specie") new1$Fi<-new1$Frequency.y/new1$Frequency.x ?new2$Vi<-new2$Volume.y/new2$Volume.x res<-new1[,5]*new2[,5]/sum(new1[,5]*new2[,5]) res #[1] 0.004169822 0.008339644 0.070886971 0.291887526 0.002776516 0.599727397 #[7] 0.022212126 A.K. ----- Original Message ----- From: Raoni Rodrigues <caciquesamurai at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, September 18, 2012 12:48 AM Subject: [R] Formula in a data-frame Hello all, I'm new in R, and I have a data-frame like this (dput information below): Specie? ? ? ? ? Fooditem Occurrence Volume 1? Schizodon? ? ? ? ? ? vegetal? ? ? ? ? 1? 0.05 2? Schizodon? ? ? ? ? sediment? ? ? ? ? 1? 0.60 3? Schizodon? ? ? ? ? ? vegetal? ? ? ? ? 1? 0.15 4? Schizodon? ? ? ? ? ? ? alga? ? ? ? ? 1? 0.05 5? Schizodon? ? ? ? ? sediment? ? ? ? ? 1? 0.90 6? Schizodon? ? ? ? ? sediment? ? ? ? ? 1? 0.30 7? Schizodon? ? ? ? ? sediment? ? ? ? ? 1? 0.90 8? Astyanax terrestrial_insect? ? ? ? ? 1? 0.10 9? Astyanax? ? ? ? ? ? vegetal? ? ? ? ? 1? 0.85 10? Astyanax? aquatical_insect? ? ? ? ? 1? 0.05 11? Astyanax? ? ? ? ? ? vegetal? ? ? ? ? 1? 0.90 12? Astyanax? ? ? ? ? un_insect? ? ? ? ? 1? 0.85 for each specie, I have to calculate a food item importance index, that is: Fi x Vi / Sum (Fi x Vi) Fi? = percentual frequency of occurrence of a food item Vi = percentual volume of a food item So, using ddply (plyr) function, I was able to calculate the total frequency of occurrence and total volume of each food item, using: Frequency = ddply (dieta, c('Specie','Fooditem') , summarise, Frequency = sum (Occurrence)) Volume = ddply (dieta, c('Specie','Fooditem') , summarise, Volume sum (Volume)) and calculate total frequency and total volume for a given specie: TFrequency = ddply (Frequency, 'Specie' , summarise, TF = sum (Frequency)) TVolume = ddply (dieta, c('Specie') , summarise, Volume = sum (Volume)) but once they have different length, I could not use together to create a percentage needed in my formula. Any suggestions? Thanks in advanced for help and attention, Raoni dput (diet) structure(list(Specie = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("Astyanax", "Schizodon"), class = "factor"), ? ? Fooditem = structure(c(6L, 3L, 6L, 1L, 3L, 3L, 3L, 4L, 6L, ? ? 2L, 6L, 5L), .Label = c("alga", "aquatical_insect", "sediment", ? ? "terrestrial_insect", "un_insect", "vegetal"), class = "factor"), ? ? Occurrence = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, ? ? 1L), Volume = c(0.05, 0.6, 0.15, 0.05, 0.9, 0.3, 0.9, 0.1, ? ? 0.85, 0.05, 0.9, 0.85)), .Names = c("Specie", "Fooditem", "Occurrence", "Volume"), class = "data.frame", row.names = c(NA, -12L)) sessionInfo() R version 2.15.1 (2012-06-22) Platform: i386-pc-mingw32/i386 (32-bit) Windows XP -- Raoni Rosa Rodrigues Research Associate of Fish Transposition Center CTPeixes Universidade Federal de Minas Gerais - UFMG Brasil rodrigues.raoni at gmail.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.