On 8/30/2014 2:11 PM, Felipe Carrillo wrote:> library(plyr)
> b <- structure(list(SampleDate = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L), .Label = "5/8/1996", class = "factor"),
TotalCount = c(1L,
> 2L, 1L, 1L, 4L, 3L, 1L, 10L, 3L), ForkLength = c(61L, 22L, NA,
> NA, 72L, 34L, 100L, 23L, 25L), TotalSalvage = c(12L, 24L, 12L,
> 12L, 17L, 23L, 31L, 12L, 15L), Age = c(1L, 0L, NA, NA, 1L, 0L,
> 1L, 0L, 0L)), .Names = c("SampleDate", "TotalCount",
"ForkLength",
> "TotalSalvage", "Age"), class = "data.frame",
row.names = c(NA,
> -9L))
> b
>
ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),pct=TotalCount/sum(TotalCount))
> Error: expecting result of length one, got : 4
I get a slightly different error:
Error: length(rows) == 1 is not TRUE
but the problem is the same. sum returns a single value, while the
computation for pct returns a vector the same length as TotalCount (the
number of rows in the specific piece of b). summarise is designed to
take a data frame and reduce the number of rows in it by
aggregating/summarizing (some of) the columns. Since your two
computations give different numbers of resulting rows, it errors out. It
seems you don't want to reduce the number of rows, so replace summarise
with mutate. That function can handle the different length return
vectors and recycles appropriately.
(The other difference between summarise and mutate is that mutate keeps
the original columns while summarise drops all original columns and
returns only the computed ones; this makes sense given that summarise
expects to return fewer rows than in the original data.)
> #Computing TotalCount inside ddply works but the pct seems wrong...
>
ddply(b,.(SampleDate,Age),summarise,salvage=sum(TotalSalvage),Count=sum(TotalCount),pct=Count/sum(Count))
> [[alternative HTML version deleted]]
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University