Spencer Graves
2013-May-12 20:54 UTC
[R] aggregate.data.frame with NAs and different types
Hello: Do you have suggestions for how to aggregate a data.frame using different functions on different columns? Consider the following example: df2aggregate <- data.frame(id=rep(letters[1:4], each=2), x =c(1:6, NA, NA), y =c(NA, 1:6, NA), a =c(NA, NA, LETTERS[1:6]), stringsAsFactors=FALSE) # Desired output: ag1.2 <- data.frame(id=letters[1:4], x =c(3, 7, 11, NA), y =c(NA, 2.5, 4.5, NA), a =c(NA, 'A', 'C', 'E'), stringsAsFactors=FALSE) I'm thinking of writing a function Aggregate(x, by, FUN, ...), where x = data.frame, by = vector of names of columns of x, and FUN = function that would accept as input a data.frame subset of x and would return a data.frame FUNout, which would be combined using cbind(x[, by], FUNout), then rbind over all such subset data.frames. However, before I write this, I'd like to make sure it doesn't already exist. My current plan is to add it to the Ecdat package. Suggestions? Should I study "plyr"? fortune(298) ;-) Thanks, Spencer p.s. library(sos); findFn('aggregate.data.frame') returned 4 matches, none of which seemed to solve this problem. findFn('aggregate data.frame') returned 133 matches in 71 package. findFn('aggregate') returned 734 matches in 282 packages. I failed to find anything useful in the latter two and with other attempts using RSiteSearch, except for a reference to plyr. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567 web: www.structuremonitoring.com
HI, Try: library(plyr) res1<-ddply(df2aggregate,.(id),summarize,x=sum(x),y=mean(y),a=head(a,1)) res1 #? id? x?? y??? a #1? a? 3? NA <NA> #2? b? 7 2.5??? A #3? c 11 4.5??? C #4? d NA? NA??? E ?res1$x<- as.numeric(res1$x) ?identical(ag1.2,res1) #[1] TRUE A.K. ----- Original Message ----- From: Spencer Graves <spencer.graves at structuremonitoring.com> To: R list <R-help at r-project.org> Cc: Sent: Sunday, May 12, 2013 4:54 PM Subject: [R] aggregate.data.frame with NAs and different types Hello: ? ? ? Do you have suggestions for how to aggregate a data.frame using different functions on different columns? ? ? ? Consider the following example: df2aggregate <- data.frame(id=rep(letters[1:4], each=2), ? ? ? ? ? ? ? ? ? ? ? ? ? ? x =c(1:6, NA, NA), ? ? ? ? ? ? ? ? ? ? ? ? ? ? y =c(NA, 1:6, NA), ? ? ? ? ? ? ? ? ? ? ? ? ? ? a =c(NA, NA, LETTERS[1:6]), ? ? ? ? ? ? ? ? ? ? ? ? ? ? stringsAsFactors=FALSE) # Desired output: ag1.2 <- data.frame(id=letters[1:4], ? ? ? ? ? ? ? ? ? ? x =c(3, 7, 11, NA), ? ? ? ? ? ? ? ? ? ? y =c(NA, 2.5, 4.5, NA), ? ? ? ? ? ? ? ? ? ? a =c(NA, 'A', 'C', 'E'), ? ? ? ? ? ? ? ? ? ? stringsAsFactors=FALSE) ? ? ? I'm thinking of writing a function Aggregate(x, by, FUN, ...), where x = data.frame, by = vector of names of columns of x, and FUN = function that would accept as input a data.frame subset of x and would return a data.frame FUNout, which would be combined using cbind(x[, by], FUNout), then rbind over all such subset data.frames.? However, before I write this, I'd like to make sure it doesn't already exist.? My current plan is to add it to the Ecdat package. ? ? ? Suggestions?? Should I study "plyr"?? fortune(298) ;-) ? ? ? Thanks, ? ? ? Spencer p.s.? library(sos); findFn('aggregate.data.frame') returned 4 matches, none of which seemed to solve this problem. findFn('aggregate data.frame') returned 133 matches in 71 package. findFn('aggregate') returned 734 matches in 282 packages.? I failed to find anything useful in the latter two and with other attempts using RSiteSearch, except for a reference to plyr. -- Spencer Graves, PE, PhD President and Chief Technology Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph:? 408-655-4567 web:? www.structuremonitoring.com ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.