Dear List, I have a question of convenience, I am looking to sum the values of one column based on another column - a example may help explain better! ED ECOCODE 21.809467 AA0101 36.229566 PA1201 51.861284 PA1201 11.36232 PA1201 27.264634 PA1201 12.261986 PA1201 46.519313 PA1201 7.815376 PA1201 2.810428 PA1201 13.478372 PA1201 35.670182 PA1301 27.128715 AT0801 19.010294 AT1201 15.475368 AT1201 18.597983 AT0101 29.292615 AT0101 6.749846 AT0101 14.981488 AT0101 14.93511 AT0101 14.93511 AT0101 21.040785 AT0101 8.271615 AT0101 12.94232 AT0101 6.749846 AT0101 15.484412 AT0101 29.644494 AT0101 43.211212 AT0101 So for AA0101 it would be = 21.809467 AT1201 it would be = 19.010294+15.475368 etc I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other. This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance! Thanks Peter
Hi Peter, R has some fairly flexible ways of passing values of some variable (X) by another (the INDEX) to different FUNctions. Here is an example using your data: ## your email data, in convenient form dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232, 27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372, 35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615, 6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615, 12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101", "AT0101", "AT0801", "AT1201", "PA1201", "PA1301"), class = "factor")), .Names = c("ED", "ECOCODE"), class = "data.frame", row.names = c(NA, -27L)) ## look at the structure of the data str(dat) ## inside of "dat" (to avoid typing its name repeatedly) ## find the sum of ED at each level of ECOCODE with(dat, tapply(X = ED, INDEX = ECOCODE, FUN = sum, na.rm = TRUE)) ## should give something like AA0101 AT0101 AT0801 AT1201 PA1201 PA1301 21.80947 236.83684 27.12871 34.48566 209.60328 35.67018 For documentation, look at: ?tapply ## similar in many ways though sometimes slightly more/less convenient ?by Hope that helps, Josh On Wed, Jan 12, 2011 at 2:38 AM, Peter Francis <peterfrancis at me.com> wrote:> Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column - a example may help explain better! > > ED ? ? ? ? ? ? ? ? ? ? ?ECOCODE > 21.809467 ? ? ? AA0101 > 36.229566 ? ? ? PA1201 > 51.861284 ? ? ? PA1201 > 11.36232 ? ? ? ?PA1201 > 27.264634 ? ? ? PA1201 > 12.261986 ? ? ? PA1201 > 46.519313 ? ? ? PA1201 > 7.815376 ? ? ? ?PA1201 > 2.810428 ? ? ? ?PA1201 > 13.478372 ? ? ? PA1201 > 35.670182 ? ? ? PA1301 > 27.128715 ? ? ? AT0801 > 19.010294 ? ? ? AT1201 > 15.475368 ? ? ? AT1201 > 18.597983 ? ? ? AT0101 > 29.292615 ? ? ? AT0101 > 6.749846 ? ? ? ?AT0101 > 14.981488 ? ? ? AT0101 > 14.93511 ? ? ? ?AT0101 > 14.93511 ? ? ? ?AT0101 > 21.040785 ? ? ? AT0101 > 8.271615 ? ? ? ?AT0101 > 12.94232 ? ? ? ?AT0101 > 6.749846 ? ? ? ?AT0101 > 15.484412 ? ? ? AT0101 > 29.644494 ? ? ? AT0101 > 43.211212 ? ? ? AT0101 > > So for AA0101 it would be = 21.809467 > ? ? ? ? ? ?AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
There are two functions you need to become familiar with: ?tapply ?ave If you wanted these summed values to be placed in another column of the same dataframe, you would use ave. If you wanted a new structure (somewhat shorter) you would use tapply with sum as the function. E. g: tapply(ecoregion$ED, ecoregion$ECOCODE, sum) -- David. On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:> Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column > - a example may help explain better! > > ED ECOCODE > 21.809467 AA0101 > 36.229566 PA1201 > 51.861284 PA1201 > 11.36232 PA1201 > 27.264634 PA1201 > 12.261986 PA1201 > 46.519313 PA1201 > 7.815376 PA1201 > 2.810428 PA1201 > 13.478372 PA1201 > 35.670182 PA1301 > 27.128715 AT0801 > 19.010294 AT1201 > 15.475368 AT1201 > 18.597983 AT0101 > 29.292615 AT0101 > 6.749846 AT0101 > 14.981488 AT0101 > 14.93511 AT0101 > 14.93511 AT0101 > 21.040785 AT0101 > 8.271615 AT0101 > 12.94232 AT0101 > 6.749846 AT0101 > 15.484412 AT0101 > 29.644494 AT0101 > 43.211212 AT0101 > > So for AA0101 it would be = 21.809467 > AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one > column and the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people > like having code to change but i have none as i am a relative > beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
David and Josh, Thanks very much for your help, it is much appreciated. Peter On 12 Jan 2011, at 14:28, David Winsemius wrote: There are two functions you need to become familiar with: ?tapply ?ave If you wanted these summed values to be placed in another column of the same dataframe, you would use ave. If you wanted a new structure (somewhat shorter) you would use tapply with sum as the function. E. g: tapply(ecoregion$ED, ecoregion$ECOCODE, sum) -- David. On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:> Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column - a example may help explain better! > > ED ECOCODE > 21.809467 AA0101 > 36.229566 PA1201 > 51.861284 PA1201 > 11.36232 PA1201 > 27.264634 PA1201 > 12.261986 PA1201 > 46.519313 PA1201 > 7.815376 PA1201 > 2.810428 PA1201 > 13.478372 PA1201 > 35.670182 PA1301 > 27.128715 AT0801 > 19.010294 AT1201 > 15.475368 AT1201 > 18.597983 AT0101 > 29.292615 AT0101 > 6.749846 AT0101 > 14.981488 AT0101 > 14.93511 AT0101 > 14.93511 AT0101 > 21.040785 AT0101 > 8.271615 AT0101 > 12.94232 AT0101 > 6.749846 AT0101 > 15.484412 AT0101 > 29.644494 AT0101 > 43.211212 AT0101 > > So for AA0101 it would be = 21.809467 > AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Or with ddply : library(plyr) dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232, 27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372, 35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615, 6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615, 12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101", "AT0101", "AT0801", "AT1201", "PA1201", "PA1301"), class = "factor")), .Names = c("ED", "ECOCODE"), class = "data.frame", row.names = c(NA, -27L)) dat ddply(dat,"ECOCODE",summarise,EDsummed=sum(ED)) ? Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish & Wildlife Service California, USA http://www.fws.gov/redbluff/rbdd_jsmp.aspx ----- Original Message ----> From: Peter Francis <peterfrancis at me.com> > To: r-help at r-project.org > Sent: Wed, January 12, 2011 2:38:19 AM > Subject: [R] Sum by column > > Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column - a >example may help explain better! > > ED??? ??? ??? ECOCODE > 21.809467??? AA0101 > 36.229566??? PA1201 > 51.861284??? PA1201 > 11.36232??? PA1201 > 27.264634??? PA1201 > 12.261986??? PA1201 > 46.519313??? PA1201 > 7.815376??? PA1201 > 2.810428??? PA1201 > 13.478372??? PA1201 > 35.670182??? PA1301 > 27.128715??? AT0801 > 19.010294??? AT1201 > 15.475368??? AT1201 > 18.597983??? AT0101 > 29.292615??? AT0101 > 6.749846??? AT0101 > 14.981488??? AT0101 > 14.93511??? AT0101 > 14.93511??? AT0101 > 21.040785??? AT0101 > 8.271615??? AT0101 > 12.94232??? AT0101 > 6.749846??? AT0101 > 15.484412??? AT0101 > 29.644494??? AT0101 > 43.211212??? AT0101 > > So for AA0101 it would be = 21.809467 > ??? ? ? AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one column and >the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people like having >code to change but i have none as i am a relative beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >