Dear List, I have a question of convenience, I am looking to sum the values of one column based on another column - a example may help explain better! ED ECOCODE 21.809467 AA0101 36.229566 PA1201 51.861284 PA1201 11.36232 PA1201 27.264634 PA1201 12.261986 PA1201 46.519313 PA1201 7.815376 PA1201 2.810428 PA1201 13.478372 PA1201 35.670182 PA1301 27.128715 AT0801 19.010294 AT1201 15.475368 AT1201 18.597983 AT0101 29.292615 AT0101 6.749846 AT0101 14.981488 AT0101 14.93511 AT0101 14.93511 AT0101 21.040785 AT0101 8.271615 AT0101 12.94232 AT0101 6.749846 AT0101 15.484412 AT0101 29.644494 AT0101 43.211212 AT0101 So for AA0101 it would be = 21.809467 AT1201 it would be = 19.010294+15.475368 etc I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other. This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance! Thanks Peter
Hi Peter,
R has some fairly flexible ways of passing values of some variable (X)
by another (the INDEX) to different FUNctions.  Here is an example
using your data:
## your email data, in convenient form
dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232,
27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372,
35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615,
6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615,
12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101",
"AT0101", "AT0801", "AT1201", "PA1201",
"PA1301"), class = "factor")),
.Names = c("ED",
"ECOCODE"), class = "data.frame", row.names = c(NA, -27L))
## look at the structure of the data
str(dat)
## inside of "dat" (to avoid typing its name repeatedly)
## find the sum of ED at each level of ECOCODE
with(dat, tapply(X = ED, INDEX = ECOCODE, FUN = sum, na.rm = TRUE))
## should give something like
   AA0101    AT0101    AT0801    AT1201    PA1201    PA1301
 21.80947 236.83684  27.12871  34.48566 209.60328  35.67018
For documentation, look at:
?tapply
## similar in many ways though sometimes slightly more/less convenient
?by
Hope that helps,
Josh
On Wed, Jan 12, 2011 at 2:38 AM, Peter Francis <peterfrancis at me.com>
wrote:> Dear List,
>
> I have a question of convenience,
>
> I am looking to sum the values of one column based on another column - a
example may help explain better!
>
> ED ? ? ? ? ? ? ? ? ? ? ?ECOCODE
> 21.809467 ? ? ? AA0101
> 36.229566 ? ? ? PA1201
> 51.861284 ? ? ? PA1201
> 11.36232 ? ? ? ?PA1201
> 27.264634 ? ? ? PA1201
> 12.261986 ? ? ? PA1201
> 46.519313 ? ? ? PA1201
> 7.815376 ? ? ? ?PA1201
> 2.810428 ? ? ? ?PA1201
> 13.478372 ? ? ? PA1201
> 35.670182 ? ? ? PA1301
> 27.128715 ? ? ? AT0801
> 19.010294 ? ? ? AT1201
> 15.475368 ? ? ? AT1201
> 18.597983 ? ? ? AT0101
> 29.292615 ? ? ? AT0101
> 6.749846 ? ? ? ?AT0101
> 14.981488 ? ? ? AT0101
> 14.93511 ? ? ? ?AT0101
> 14.93511 ? ? ? ?AT0101
> 21.040785 ? ? ? AT0101
> 8.271615 ? ? ? ?AT0101
> 12.94232 ? ? ? ?AT0101
> 6.749846 ? ? ? ?AT0101
> 15.484412 ? ? ? AT0101
> 29.644494 ? ? ? AT0101
> 43.211212 ? ? ? AT0101
>
> So for AA0101 it would be = 21.809467
> ? ? ? ? ? ?AT1201 it would be = 19.010294+15.475368
>
> etc
>
> I would then like to be able to output a table with ECOCODE in one column
and the sum of ED in the other.
>
> This is stored in a dataframe called ecoregion, i understand people like
having code to change but i have none as i am a relative beginner! Sorry in
advance!
>
> Thanks
>
> Peter
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
There are two functions you need to become familiar with: ?tapply ?ave If you wanted these summed values to be placed in another column of the same dataframe, you would use ave. If you wanted a new structure (somewhat shorter) you would use tapply with sum as the function. E. g: tapply(ecoregion$ED, ecoregion$ECOCODE, sum) -- David. On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:> Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column > - a example may help explain better! > > ED ECOCODE > 21.809467 AA0101 > 36.229566 PA1201 > 51.861284 PA1201 > 11.36232 PA1201 > 27.264634 PA1201 > 12.261986 PA1201 > 46.519313 PA1201 > 7.815376 PA1201 > 2.810428 PA1201 > 13.478372 PA1201 > 35.670182 PA1301 > 27.128715 AT0801 > 19.010294 AT1201 > 15.475368 AT1201 > 18.597983 AT0101 > 29.292615 AT0101 > 6.749846 AT0101 > 14.981488 AT0101 > 14.93511 AT0101 > 14.93511 AT0101 > 21.040785 AT0101 > 8.271615 AT0101 > 12.94232 AT0101 > 6.749846 AT0101 > 15.484412 AT0101 > 29.644494 AT0101 > 43.211212 AT0101 > > So for AA0101 it would be = 21.809467 > AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one > column and the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people > like having code to change but i have none as i am a relative > beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
David and Josh, Thanks very much for your help, it is much appreciated. Peter On 12 Jan 2011, at 14:28, David Winsemius wrote: There are two functions you need to become familiar with: ?tapply ?ave If you wanted these summed values to be placed in another column of the same dataframe, you would use ave. If you wanted a new structure (somewhat shorter) you would use tapply with sum as the function. E. g: tapply(ecoregion$ED, ecoregion$ECOCODE, sum) -- David. On Jan 12, 2011, at 5:38 AM, Peter Francis wrote:> Dear List, > > I have a question of convenience, > > I am looking to sum the values of one column based on another column - a example may help explain better! > > ED ECOCODE > 21.809467 AA0101 > 36.229566 PA1201 > 51.861284 PA1201 > 11.36232 PA1201 > 27.264634 PA1201 > 12.261986 PA1201 > 46.519313 PA1201 > 7.815376 PA1201 > 2.810428 PA1201 > 13.478372 PA1201 > 35.670182 PA1301 > 27.128715 AT0801 > 19.010294 AT1201 > 15.475368 AT1201 > 18.597983 AT0101 > 29.292615 AT0101 > 6.749846 AT0101 > 14.981488 AT0101 > 14.93511 AT0101 > 14.93511 AT0101 > 21.040785 AT0101 > 8.271615 AT0101 > 12.94232 AT0101 > 6.749846 AT0101 > 15.484412 AT0101 > 29.644494 AT0101 > 43.211212 AT0101 > > So for AA0101 it would be = 21.809467 > AT1201 it would be = 19.010294+15.475368 > > etc > > I would then like to be able to output a table with ECOCODE in one column and the sum of ED in the other. > > This is stored in a dataframe called ecoregion, i understand people like having code to change but i have none as i am a relative beginner! Sorry in advance! > > Thanks > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
Or with ddply :
library(plyr)
dat <- structure(list(ED = c(21.809467, 36.229566, 51.861284, 11.36232,
27.264634, 12.261986, 46.519313, 7.815376, 2.810428, 13.478372,
35.670182, 27.128715, 19.010294, 15.475368, 18.597983, 29.292615,
6.749846, 14.981488, 14.93511, 14.93511, 21.040785, 8.271615,
12.94232, 6.749846, 15.484412, 29.644494, 43.211212), ECOCODE = structure(c(1L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 3L, 4L, 4L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AA0101",
"AT0101", "AT0801", "AT1201", "PA1201",
"PA1301"), class = "factor")),
.Names = c("ED",
"ECOCODE"), class = "data.frame", row.names = c(NA, -27L))
dat
ddply(dat,"ECOCODE",summarise,EDsummed=sum(ED))
?
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx
----- Original Message ----> From: Peter Francis <peterfrancis at me.com>
> To: r-help at r-project.org
> Sent: Wed, January 12, 2011 2:38:19 AM
> Subject: [R] Sum by column
> 
> Dear List,
> 
> I have a question of convenience,
> 
> I am looking to sum the values of one column based on another column - a 
>example may help explain better!
> 
> ED??? ??? ??? ECOCODE
> 21.809467??? AA0101
> 36.229566??? PA1201
> 51.861284??? PA1201
> 11.36232??? PA1201
> 27.264634??? PA1201
> 12.261986??? PA1201
> 46.519313??? PA1201
> 7.815376??? PA1201
> 2.810428??? PA1201
> 13.478372??? PA1201
> 35.670182??? PA1301
> 27.128715??? AT0801
> 19.010294??? AT1201
> 15.475368??? AT1201
> 18.597983??? AT0101
> 29.292615??? AT0101
> 6.749846??? AT0101
> 14.981488??? AT0101
> 14.93511??? AT0101
> 14.93511??? AT0101
> 21.040785??? AT0101
> 8.271615??? AT0101
> 12.94232??? AT0101
> 6.749846??? AT0101
> 15.484412??? AT0101
> 29.644494??? AT0101
> 43.211212??? AT0101
> 
> So for AA0101 it would be = 21.809467
> ??? ? ? AT1201 it would be = 19.010294+15.475368
> 
> etc
> 
> I would then like to be able to output a table with ECOCODE in one column
and
>the sum of ED in the other.
> 
> This is stored in a dataframe called ecoregion, i understand people like
having
>code to change but i have none as i am a relative beginner! Sorry in
advance!
> 
> Thanks 
> 
> Peter
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>