thr3ads.net - R help - [R] Summing data frame columns on identical data [Jan 2011]

If this information is useful, please help other people find it:
Share via:

Steve Murray

2011-Jan-17 18:42 UTC

[R] Summing data frame columns on identical data

Dear all,

I have 9 data frames, and I'm simply trying to sum the values of column 3
(on a row-by-row basis). However, there are a slightly different number of rows
in each data frame, so I'm receiving the following error: "Error in
Ops.data.frame(mrunoff_207101[3], mrunoff_207102[3]) :
? + only defined for equally-sized data frames".

Here is what I'm attempting to do:
> arunoff_2071 <- cbind(mrunoff_207101[1:2], (mrunoff_207101[3] +
mrunoff_207102[3] + mrunoff_207103[3] + mrunoff_207104[3] + mrunoff_207105[3] +
mrunoff_207106[3] + mrunoff_207107[3] + mrunoff_207108[3] + mrunoff_207109[3]))

Is there an easy way of summing based on congruent values in columns 1 and 2?
The only way I can think of would be to use merge, but this would involve doing
this for every pair of data frames.

The data for each data frame look like this:
> head(mrunoff_207101)? Latitude Longitude????????? FPC
1???? 5.75????? 0.25 0.0112384744
2???? 6.25????? 0.25 0.0019959067
3???? 6.75????? 0.25 0.0003245941
4???? 7.25????? 0.25 0.0011973676
5???? 7.75????? 0.25 0.0001062602
6???? 8.25????? 0.25 0.0451578423


Any suggestions on how to achieve this easily will be very welcome.

Many thanks,

Steve

Dennis Murphy

2011-Jan-17 22:08 UTC

head link

[R] Summing data frame columns on identical data

Hi:

Try this based on the following toy example:

############### Generate a list of named data frames
# There are more efficient ways to do this with replicate, but I forgot :)
# A function to generate a data frame
dmake <- function() data.frame(A = factor(rep(1:5, each = 10)),
                                B = rep(rep(c(0.25, 0.5), each = 5), 5),
                                y = rnorm(50))
# Create an empty list
dflist <- vector('list', 5)
# populate it
for(i in 1:5) dflist[[i]] <- dmake()
# Give names to the list components:
names(dflist) <- paste('df', 1:5, sep = '')
################

library(plyr)
# Function to sum y by A-B combinations for a generic data frame
dsum <- function(d) ddply(d, .(A, B), summarise, sumY = sum(y))

# Apply it to each component of the list:

# Returns a list
summlist <- llply(dflist, dsum)
# Returns a data frame
summdf <- ldply(dflist, dsum)

Since you state that the individual data frames have different lengths, you
may want to add another variable to dsum to return length, perhaps something
like

dsum2 <- function(d) ddply(d, .(A, B), summarise, sumY = sum(y), n length(y))

and apply either or both of the llply/ldply calls with dsum2 substituted for
dsum.

If you want to combine certain groups together, you can create a new factor
that merges levels. The following post in the archives provides a clue:
http://r.789695.n4.nabble.com/Documentation-detail-was-Merging-factor-levels-td911547.html

Since you already have the data frames, you can do something like

# Names of existing data frames in the workspace
filelist <- c('mydf1', 'mydf2', 'anotherdf',
'what_more', 'oyvey')
dflist <-  as.list(sapply(filelist, get))

and then move on to the summarization stage.

HTH,
Dennis

On Mon, Jan 17, 2011 at 10:42 AM, Steve Murray
<smurray444@hotmail.com>wrote:
>
> Dear all,
>
> I have 9 data frames, and I'm simply trying to sum the values of column
3
> (on a row-by-row basis). However, there are a slightly different number of
> rows in each data frame, so I'm receiving the following error:
"Error in
> Ops.data.frame(mrunoff_207101[3], mrunoff_207102[3]) :
>   + only defined for equally-sized data frames".
>
> Here is what I'm attempting to do:
>
> > arunoff_2071 <- cbind(mrunoff_207101[1:2], (mrunoff_207101[3] +
> mrunoff_207102[3] + mrunoff_207103[3] + mrunoff_207104[3] +
> mrunoff_207105[3] + mrunoff_207106[3] + mrunoff_207107[3] +
> mrunoff_207108[3] + mrunoff_207109[3]))
>
>
> Is there an easy way of summing based on congruent values in columns 1 and
> 2? The only way I can think of would be to use merge, but this would
involve
> doing this for every pair of data frames.
>
> The data for each data frame look like this:
>
> > head(mrunoff_207101)
>   Latitude Longitude          FPC
> 1     5.75      0.25 0.0112384744
> 2     6.25      0.25 0.0019959067
> 3     6.75      0.25 0.0003245941
> 4     7.25      0.25 0.0011973676
> 5     7.75      0.25 0.0001062602
> 6     8.25      0.25 0.0451578423
>
>
> Any suggestions on how to achieve this easily will be very welcome.
>
> Many thanks,
>
> Steve
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Hadley Wickham

2011-Jan-17 23:10 UTC

head link

[R] Summing data frame columns on identical data

> library(plyr)
> # Function to sum y by A-B combinations for a generic data frame
> dsum <- function(d) ddply(d, .(A, B), summarise, sumY = sum(y))
See count in plyr 1.4 for a much much faster way of doing this.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jan 2011 - Summing data frame columns on identical data

[R] Summing data frame columns on identical data

[R] Summing data frame columns on identical data

[R] Summing data frame columns on identical data

Apparently Analagous Threads