Can someone please tell me what is up with na.action in aggregate? My (somewhat) reproducible example: (I say somewhat because some lines wouldn't run in a separate session, more below) set.seed(100) dat=data.frame( x1=sample(c(NA,'m','f'), 100, replace=TRUE), x2=sample(c(NA, 1:10), 100, replace=TRUE), x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), x4=sample(c(NA,T,F), 100, replace=TRUE), y=sample(c(rep(NA,5), rnorm(95)))) dat ## The total from dat: sum(dat$y, na.rm=T) ## The total from aggregate: sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This line gave an error in a separate R instance ## The aggregate formula is excluding NA ## So, let's try to include NAs sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## The aggregate formula is STILL excluding NA ## In fact, the formula doesn't seem to notice the na.action sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man chew')$y) ## Hmmmm... that error surprised me (since the previous two things ran) ## So, let's try to change the global options ## (not mentioned in the help, but after reading the help ## 100 times, I thought I would go above and beyond to avoid ## any r list flames from people complaining ## that I didn't read the help... but that's a separate topic) options(na.action ="na.pass") sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) ## (NAs are still omitted) ## Even more frustrating... ## Why don't any of these work??? sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) ## This does work, but in my real data set, I want NA to really be NA for(j in 1:4) dat[is.na(dat[,j]),j] = 'NA' sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## My first session info # #> sessionInfo() #R version 2.12.0 (2010-10-15) #Platform: i386-pc-mingw32/i386 (32-bit) # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base # #other attached packages: # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 # #loaded via a namespace (and not attached): # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8 #[5] tools_2.12.0 I tried running that example in a different version of R, with and I got completely different results The other version of R wouldn't recognize the formula at all.. My other version of R: # My second session info #> sessionInfo() #R version 2.10.1 (2009-12-14) #i386-pc-mingw32 # #locale: # [1] LC_COLLATE=English_United States.1252 #[2] LC_CTYPE=English_United States.1252 #[3] LC_MONETARY=English_United States.1252 #[4] LC_NUMERIC=C #[5] LC_TIME=English_United States.1252 # #attached base packages: # [1] stats graphics grDevices utils datasets methods base #> # PS: Also, I have read the help on aggregate, factor, as.factor, and several other topics. If I missed something, please let me know. Some people like to reply to questions by telling the sender that R has documentation. Please don't. The R help archives are littered with reminders, friendly and otherwise, of R's documentation. [[alternative HTML version deleted]]
Gene - Let me try to address your concerns one at a time: Since the formula interface to aggregate was introduced pretty recently (I think R-2.11.1, but I might be wrong) so when you try to use it in an R-2.10.1 it won't work. Now let's take a close look at the help page for aggregate. The default method, which will be called if you pass a vector to aggregate, or the data frame method are described like this: aggregate(x, ...) ## S3 method for class 'data.frame' aggregate(x, by, FUN, ..., simplify = TRUE) So if you pass an na.action= argument to aggregate when the first argument is a vector or data frame, it gets picked up by the ... argument and gets passed to your function, so you might see messages like this:> sum(1:10,na.action=na.omit)Error in sum(1:10, na.action = na.omit) : invalid 'type' (closure) of argument> sum(1:10,na.action='na.omit')Error in sum(1:10, na.action = "na.omit") : invalid 'type' (character) of argument (It's sum complaining, not aggregate.) As far as na.action goes, when you're using the aggregate formula method, it will remove all rows from the specified data frame that have any missing values. If you pass that to a function with the na.rm=TRUE argument, that function will remove the missing values as it should. So the only time you'll see the effect of na.action=na.pass is when you call a function that won't remove the missing values. (The subtle distinction between na.action=na.omit and na.rm=TRUE is the function you're calling is that na.omit will remove the entire row of data when it encounters a missing value, while the na.rm=TRUE argument will remove missing values separately from each variable.) Hope this helps. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Fri, 4 Feb 2011, Gene Leynes wrote:> Can someone please tell me what is up with na.action in aggregate? > > My (somewhat) reproducible example: > (I say somewhat because some lines wouldn't run in a separate session, more > below) > > set.seed(100) > dat=data.frame( > x1=sample(c(NA,'m','f'), 100, replace=TRUE), > x2=sample(c(NA, 1:10), 100, replace=TRUE), > x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > x4=sample(c(NA,T,F), 100, replace=TRUE), > y=sample(c(rep(NA,5), rnorm(95)))) > dat > ## The total from dat: > sum(dat$y, na.rm=T) > ## The total from aggregate: > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ## <--- This line > gave an error in a separate R instance > ## The aggregate formula is excluding NA > > ## So, let's try to include NAs > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) > ## The aggregate formula is STILL excluding NA > ## In fact, the formula doesn't seem to notice the na.action > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man > chew')$y) > ## Hmmmm... that error surprised me (since the previous two things ran) > > ## So, let's try to change the global options > ## (not mentioned in the help, but after reading the help > ## 100 times, I thought I would go above and beyond to avoid > ## any r list flames from people complaining > ## that I didn't read the help... but that's a separate topic) > options(na.action ="na.pass") > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) > ## (NAs are still omitted) > > ## Even more frustrating... > ## Why don't any of these work??? > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) > > > ## This does work, but in my real data set, I want NA to really be NA > for(j in 1:4) > dat[is.na(dat[,j]),j] = 'NA' > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > > > ## My first session info > # > #> sessionInfo() > #R version 2.12.0 (2010-10-15) > #Platform: i386-pc-mingw32/i386 (32-bit) > # > #locale: > # [1] LC_COLLATE=English_United States.1252 > #[2] LC_CTYPE=English_United States.1252 > #[3] LC_MONETARY=English_United States.1252 > #[4] LC_NUMERIC=C > #[5] LC_TIME=English_United States.1252 > # > #attached base packages: > # [1] stats graphics grDevices utils datasets methods > base > # > #other attached packages: > # [1] plyr_1.2.1 zoo_1.6-4 gdata_2.8.1 rj_0.5.0-5 > # > #loaded via a namespace (and not attached): > # [1] grid_2.12.0 gtools_2.6.2 lattice_0.19-13 rJava_0.8-8 > #[5] tools_2.12.0 > > > > I tried running that example in a different version of R, with and I got > completely different results > > The other version of R wouldn't recognize the formula at all.. > > My other version of R: > > # My second session info > #> sessionInfo() > #R version 2.10.1 (2009-12-14) > #i386-pc-mingw32 > # > #locale: > # [1] LC_COLLATE=English_United States.1252 > #[2] LC_CTYPE=English_United States.1252 > #[3] LC_MONETARY=English_United States.1252 > #[4] LC_NUMERIC=C > #[5] LC_TIME=English_United States.1252 > # > #attached base packages: > # [1] stats graphics grDevices utils datasets methods > base > #> > # > > PS: Also, I have read the help on aggregate, factor, as.factor, and several > other topics. If I missed something, please let me know. > Some people like to reply to questions by telling the sender that R has > documentation. Please don't. The R help archives are littered with > reminders, friendly and otherwise, of R's documentation. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi, Please see ?na.action (just kidding!) So it seems to me the problem is that you are passing na.rm to the sum function. So there is no missing data for the na.action argument to operate on! Compare sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) Best, Ista On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> wrote:> Can someone please tell me what is up with na.action in aggregate? > > My (somewhat) reproducible example: > (I say somewhat because some lines wouldn't run in a separate session, more > below) > > set.seed(100) > dat=data.frame( > ? ? ? ?x1=sample(c(NA,'m','f'), 100, replace=TRUE), > ? ? ? ?x2=sample(c(NA, 1:10), 100, replace=TRUE), > ? ? ? ?x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > ? ? ? ?x4=sample(c(NA,T,F), 100, replace=TRUE), > ? ? ? ?y=sample(c(rep(NA,5), rnorm(95)))) > dat > ## The total from dat: > sum(dat$y, na.rm=T) > ## The total from aggregate: > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ?## <--- This line > gave an error in a separate R instance > ## The aggregate formula is excluding NA > > ## So, let's try to include NAs > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) > ## The aggregate formula is STILL excluding NA > ## In fact, the formula doesn't seem to notice the na.action > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man > chew')$y) > ## Hmmmm... that error surprised me (since the previous two things ran) > > ## So, let's try to change the global options > ## (not mentioned in the help, but after reading the help > ## ?100 times, I thought I would go above and beyond to avoid > ## ?any r list flames from people complaining > ## ?that I didn't read the help... but that's a separate topic) > options(na.action ="na.pass") > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='na.pass')$y) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action=na.pass)$y) > ## (NAs are still omitted) > > ## Even more frustrating... > ## Why don't any of these work??? > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) > > > ## This does work, but in my real data set, I want NA to really be NA > for(j in 1:4) > ? ?dat[is.na(dat[,j]),j] = 'NA' > sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) > > > ## My first session info > # > #> sessionInfo() > #R version 2.12.0 (2010-10-15) > #Platform: i386-pc-mingw32/i386 (32-bit) > # > #locale: > # ? ? ? ?[1] LC_COLLATE=English_United States.1252 > #[2] LC_CTYPE=English_United States.1252 > #[3] LC_MONETARY=English_United States.1252 > #[4] LC_NUMERIC=C > #[5] LC_TIME=English_United States.1252 > # > #attached base packages: > # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > # > #other attached packages: > # ? ? ? ?[1] plyr_1.2.1 ?zoo_1.6-4 ? gdata_2.8.1 rj_0.5.0-5 > # > #loaded via a namespace (and not attached): > # ? ? ? ?[1] grid_2.12.0 ? ? gtools_2.6.2 ? ?lattice_0.19-13 rJava_0.8-8 > #[5] tools_2.12.0 > > > > I tried running that example in a different version of R, with and I got > completely different results > > The other version of R wouldn't recognize the formula at all.. > > My other version of R: > > # ?My second session info > #> sessionInfo() > #R version 2.10.1 (2009-12-14) > #i386-pc-mingw32 > # > #locale: > # ? ? ? ?[1] LC_COLLATE=English_United States.1252 > #[2] LC_CTYPE=English_United States.1252 > #[3] LC_MONETARY=English_United States.1252 > #[4] LC_NUMERIC=C > #[5] LC_TIME=English_United States.1252 > # > #attached base packages: > # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods > base > #> > # > > PS: Also, I have read the help on aggregate, factor, as.factor, and several > other topics. ?If I missed something, please let me know. > Some people like to reply to questions by telling the sender that R has > documentation. ?Please don't. ?The R help archives are littered with > reminders, friendly and otherwise, of R's documentation. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Hi, On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes <gleynes+r at gmail.com> wrote:> Thank you both for the thoughtful (and funny) replies. > > I agree with both of you that sum is the one picking up aggregate.? Although > I didn't mention it, I did realize that in the first place. > Also, thank you Phil for pointing out that aggregate only accepts a formula > value in more recent versions!? I actually thought that was an older > feature, but I must be thinking of other functions. > > I still don't see why these two values are not the same! > > It seems like a bug to meNo, not a bug (see below).> >> set.seed(100) >> dat=data.frame( > +???????? x1=sample(c(NA,'m','f'), 100, replace=TRUE), > +???????? x2=sample(c(NA, 1:10), 100, replace=TRUE), > +???????? x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), > +???????? x4=sample(c(NA,T,F), 100, replace=TRUE), > +???????? y=sample(c(rep(NA,5), rnorm(95)))) >> sum(dat$y, na.rm=T) > [1] 0.0815244116598 >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y) > [1] -4.45087666247 >>Because in the first one you are only removing missing data in dat$y. In the second one you are removeing all rows that contain missing data in any of the columns. all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, na.rm=T)$y)) [1] TRUE Best, Ista> > > > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <izahn at psych.rochester.edu> wrote: >> >> Sorry, I didn't see Phil's reply, which is better than mine anyway. >> >> -Ista >> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <izahn at psych.rochester.edu> >> wrote: >> > Hi, >> > >> > Please see ?na.action >> > >> > (just kidding!) >> > >> > So it seems to me the problem is that you are passing na.rm to the sum >> > function. So there is no missing data for the na.action argument to >> > operate on! >> > >> > Compare >> > >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) >> > >> > >> > Best, >> > Ista >> > >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> wrote: >> >> Can someone please tell me what is up with na.action in aggregate? >> >> >> >> My (somewhat) reproducible example: >> >> (I say somewhat because some lines wouldn't run in a separate session, >> >> more >> >> below) >> >> >> >> set.seed(100) >> >> dat=data.frame( >> >> ? ? ? ?x1=sample(c(NA,'m','f'), 100, replace=TRUE), >> >> ? ? ? ?x2=sample(c(NA, 1:10), 100, replace=TRUE), >> >> ? ? ? ?x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >> >> ? ? ? ?x4=sample(c(NA,T,F), 100, replace=TRUE), >> >> ? ? ? ?y=sample(c(rep(NA,5), rnorm(95)))) >> >> dat >> >> ## The total from dat: >> >> sum(dat$y, na.rm=T) >> >> ## The total from aggregate: >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ?## <--- This >> >> line >> >> gave an error in a separate R instance >> >> ## The aggregate formula is excluding NA >> >> >> >> ## So, let's try to include NAs >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> na.action='na.pass')$y) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> na.action=na.pass)$y) >> >> ## The aggregate formula is STILL excluding NA >> >> ## In fact, the formula doesn't seem to notice the na.action >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo man >> >> chew')$y) >> >> ## Hmmmm... that error surprised me (since the previous two things ran) >> >> >> >> ## So, let's try to change the global options >> >> ## (not mentioned in the help, but after reading the help >> >> ## ?100 times, I thought I would go above and beyond to avoid >> >> ## ?any r list flames from people complaining >> >> ## ?that I didn't read the help... but that's a separate topic) >> >> options(na.action ="na.pass") >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> na.action='na.pass')$y) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> na.action=na.pass)$y) >> >> ## (NAs are still omitted) >> >> >> >> ## Even more frustrating... >> >> ## Why don't any of these work??? >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.pass')$x) >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action='na.omit')$x) >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) >> >> >> >> >> >> ## This does work, but in my real data set, I want NA to really be NA >> >> for(j in 1:4) >> >> ? ?dat[is.na(dat[,j]),j] = 'NA' >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> >> >> >> >> ## My first session info >> >> # >> >> #> sessionInfo() >> >> #R version 2.12.0 (2010-10-15) >> >> #Platform: i386-pc-mingw32/i386 (32-bit) >> >> # >> >> #locale: >> >> # ? ? ? ?[1] LC_COLLATE=English_United States.1252 >> >> #[2] LC_CTYPE=English_United States.1252 >> >> #[3] LC_MONETARY=English_United States.1252 >> >> #[4] LC_NUMERIC=C >> >> #[5] LC_TIME=English_United States.1252 >> >> # >> >> #attached base packages: >> >> # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods >> >> base >> >> # >> >> #other attached packages: >> >> # ? ? ? ?[1] plyr_1.2.1 ?zoo_1.6-4 ? gdata_2.8.1 rj_0.5.0-5 >> >> # >> >> #loaded via a namespace (and not attached): >> >> # ? ? ? ?[1] grid_2.12.0 ? ? gtools_2.6.2 ? ?lattice_0.19-13 >> >> rJava_0.8-8 >> >> #[5] tools_2.12.0 >> >> >> >> >> >> >> >> I tried running that example in a different version of R, with and I >> >> got >> >> completely different results >> >> >> >> The other version of R wouldn't recognize the formula at all.. >> >> >> >> My other version of R: >> >> >> >> # ?My second session info >> >> #> sessionInfo() >> >> #R version 2.10.1 (2009-12-14) >> >> #i386-pc-mingw32 >> >> # >> >> #locale: >> >> # ? ? ? ?[1] LC_COLLATE=English_United States.1252 >> >> #[2] LC_CTYPE=English_United States.1252 >> >> #[3] LC_MONETARY=English_United States.1252 >> >> #[4] LC_NUMERIC=C >> >> #[5] LC_TIME=English_United States.1252 >> >> # >> >> #attached base packages: >> >> # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods >> >> base >> >> #> >> >> # >> >> >> >> PS: Also, I have read the help on aggregate, factor, as.factor, and >> >> several >> >> other topics. ?If I missed something, please let me know. >> >> Some people like to reply to questions by telling the sender that R has >> >> documentation. ?Please don't. ?The R help archives are littered with >> >> reminders, friendly and otherwise, of R's documentation. >> >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > >> > >> > -- >> > Ista Zahn >> > Graduate student >> > University of Rochester >> > Department of Clinical and Social Psychology >> > http://yourpsyche.org >> > >> >> >> >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org > >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org
Hi again, On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes <gleynes+r at gmail.com> wrote:> Ista, > > Thank you again. > > I had figured that out... and was crafting another message when you replied. > > The NAs do come though on the variable that is being aggregated, > However, they do not come through on the categorical variable(s). > > The aggregate function must be converting the data frame variables to > factors, with the default "omit=NA" parameter. > > The help on "aggregate" says: > na.action???? A function which indicates what should happen when the data > contain NA values. > ????????????? The default is to ignore missing values in the given > variables. > By "data" it must only refer to the aggregated variable, and not the > categorical variables.? I thought it referred to both, because I thought it > referred to the "data" argument, which is the underlying data frame. > > I think the proper way to accomplish this would be to recast my x > (categorical) variables as factors.Yes, that would work. This is not feasible for me due to> other complications. > Also, (imho) the help should be more clear about what the na.action > modifies. > > So, unless someone has a better idea, I guess I'm out of luck?Well, you can use ddply from the plyr package: library(plyr) # may need to install first. sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, na.rm=TRUE))})$y) However, I don't think you've told us what you're actually trying to accomplish... Best, Ista> > > On Fri, Feb 4, 2011 at 6:05 PM, Ista Zahn <izahn at psych.rochester.edu> wrote: >> >> Hi, >> >> On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes <gleynes+r at gmail.com> wrote: >> > Thank you both for the thoughtful (and funny) replies. >> > >> > I agree with both of you that sum is the one picking up aggregate. >> > Although >> > I didn't mention it, I did realize that in the first place. >> > Also, thank you Phil for pointing out that aggregate only accepts a >> > formula >> > value in more recent versions!? I actually thought that was an older >> > feature, but I must be thinking of other functions. >> > >> > I still don't see why these two values are not the same! >> > >> > It seems like a bug to me >> >> No, not a bug (see below). >> >> > >> >> set.seed(100) >> >> dat=data.frame( >> > +???????? x1=sample(c(NA,'m','f'), 100, replace=TRUE), >> > +???????? x2=sample(c(NA, 1:10), 100, replace=TRUE), >> > +???????? x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >> > +???????? x4=sample(c(NA,T,F), 100, replace=TRUE), >> > +???????? y=sample(c(rep(NA,5), rnorm(95)))) >> >> sum(dat$y, na.rm=T) >> > [1] 0.0815244116598 >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass, >> >> na.rm=T)$y) >> > [1] -4.45087666247 >> >> >> >> Because in the first one you are only removing missing data in dat$y. >> In the second one you are removeing all rows that contain missing data >> in any of the columns. >> >> all.equal(sum(na.omit(dat)$y), sum(aggregate(y~x1+x2+x3+x4, data=dat, >> sum, na.action=na.pass, na.rm=T)$y)) >> [1] TRUE >> >> Best, >> Ista >> >> > >> > >> > >> > On Fri, Feb 4, 2011 at 4:18 PM, Ista Zahn <izahn at psych.rochester.edu> >> > wrote: >> >> >> >> Sorry, I didn't see Phil's reply, which is better than mine anyway. >> >> >> >> -Ista >> >> >> >> On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn <izahn at psych.rochester.edu> >> >> wrote: >> >> > Hi, >> >> > >> >> > Please see ?na.action >> >> > >> >> > (just kidding!) >> >> > >> >> > So it seems to me the problem is that you are passing na.rm to the >> >> > sum >> >> > function. So there is no missing data for the na.action argument to >> >> > operate on! >> >> > >> >> > Compare >> >> > >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.pass)$y) >> >> > sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.omit)$y) >> >> > >> >> > >> >> > Best, >> >> > Ista >> >> > >> >> > On Fri, Feb 4, 2011 at 4:07 PM, Gene Leynes <gleynes+r at gmail.com> >> >> > wrote: >> >> >> Can someone please tell me what is up with na.action in aggregate? >> >> >> >> >> >> My (somewhat) reproducible example: >> >> >> (I say somewhat because some lines wouldn't run in a separate >> >> >> session, >> >> >> more >> >> >> below) >> >> >> >> >> >> set.seed(100) >> >> >> dat=data.frame( >> >> >> ? ? ? ?x1=sample(c(NA,'m','f'), 100, replace=TRUE), >> >> >> ? ? ? ?x2=sample(c(NA, 1:10), 100, replace=TRUE), >> >> >> ? ? ? ?x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), >> >> >> ? ? ? ?x4=sample(c(NA,T,F), 100, replace=TRUE), >> >> >> ? ? ? ?y=sample(c(rep(NA,5), rnorm(95)))) >> >> >> dat >> >> >> ## The total from dat: >> >> >> sum(dat$y, na.rm=T) >> >> >> ## The total from aggregate: >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) ?## <--- >> >> >> This >> >> >> line >> >> >> gave an error in a separate R instance >> >> >> ## The aggregate formula is excluding NA >> >> >> >> >> >> ## So, let's try to include NAs >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action='na.pass')$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action=na.pass)$y) >> >> >> ## The aggregate formula is STILL excluding NA >> >> >> ## In fact, the formula doesn't seem to notice the na.action >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, na.action='foo >> >> >> man >> >> >> chew')$y) >> >> >> ## Hmmmm... that error surprised me (since the previous two things >> >> >> ran) >> >> >> >> >> >> ## So, let's try to change the global options >> >> >> ## (not mentioned in the help, but after reading the help >> >> >> ## ?100 times, I thought I would go above and beyond to avoid >> >> >> ## ?any r list flames from people complaining >> >> >> ## ?that I didn't read the help... but that's a separate topic) >> >> >> options(na.action ="na.pass") >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action='na.pass')$y) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T, >> >> >> na.action=na.pass)$y) >> >> >> ## (NAs are still omitted) >> >> >> >> >> >> ## Even more frustrating... >> >> >> ## Why don't any of these work??? >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >> >> >> na.action='na.pass')$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.pass)$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, >> >> >> na.action='na.omit')$x) >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T, na.action=na.omit)$x) >> >> >> >> >> >> >> >> >> ## This does work, but in my real data set, I want NA to really be >> >> >> NA >> >> >> for(j in 1:4) >> >> >> ? ?dat[is.na(dat[,j]),j] = 'NA' >> >> >> sum(aggregate(dat$y, dat[,1:4], sum, na.rm=T)$x) >> >> >> sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.rm=T)$y) >> >> >> >> >> >> >> >> >> ## My first session info >> >> >> # >> >> >> #> sessionInfo() >> >> >> #R version 2.12.0 (2010-10-15) >> >> >> #Platform: i386-pc-mingw32/i386 (32-bit) >> >> >> # >> >> >> #locale: >> >> >> # ? ? ? ?[1] LC_COLLATE=English_United States.1252 >> >> >> #[2] LC_CTYPE=English_United States.1252 >> >> >> #[3] LC_MONETARY=English_United States.1252 >> >> >> #[4] LC_NUMERIC=C >> >> >> #[5] LC_TIME=English_United States.1252 >> >> >> # >> >> >> #attached base packages: >> >> >> # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets >> >> >> ?methods >> >> >> base >> >> >> # >> >> >> #other attached packages: >> >> >> # ? ? ? ?[1] plyr_1.2.1 ?zoo_1.6-4 ? gdata_2.8.1 rj_0.5.0-5 >> >> >> # >> >> >> #loaded via a namespace (and not attached): >> >> >> # ? ? ? ?[1] grid_2.12.0 ? ? gtools_2.6.2 ? ?lattice_0.19-13 >> >> >> rJava_0.8-8 >> >> >> #[5] tools_2.12.0 >> >> >> >> >> >> >> >> >> >> >> >> I tried running that example in a different version of R, with and I >> >> >> got >> >> >> completely different results >> >> >> >> >> >> The other version of R wouldn't recognize the formula at all.. >> >> >> >> >> >> My other version of R: >> >> >> >> >> >> # ?My second session info >> >> >> #> sessionInfo() >> >> >> #R version 2.10.1 (2009-12-14) >> >> >> #i386-pc-mingw32 >> >> >> # >> >> >> #locale: >> >> >> # ? ? ? ?[1] LC_COLLATE=English_United States.1252 >> >> >> #[2] LC_CTYPE=English_United States.1252 >> >> >> #[3] LC_MONETARY=English_United States.1252 >> >> >> #[4] LC_NUMERIC=C >> >> >> #[5] LC_TIME=English_United States.1252 >> >> >> # >> >> >> #attached base packages: >> >> >> # ? ? ? ?[1] stats ? ? graphics ?grDevices utils ? ? datasets >> >> >> ?methods >> >> >> base >> >> >> #> >> >> >> # >> >> >> >> >> >> PS: Also, I have read the help on aggregate, factor, as.factor, and >> >> >> several >> >> >> other topics. ?If I missed something, please let me know. >> >> >> Some people like to reply to questions by telling the sender that R >> >> >> has >> >> >> documentation. ?Please don't. ?The R help archives are littered with >> >> >> reminders, friendly and otherwise, of R's documentation. >> >> >> >> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> >> >> >> >> ______________________________________________ >> >> >> R-help at r-project.org mailing list >> >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >> PLEASE do read the posting guide >> >> >> http://www.R-project.org/posting-guide.html >> >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Ista Zahn >> >> > Graduate student >> >> > University of Rochester >> >> > Department of Clinical and Social Psychology >> >> > http://yourpsyche.org >> >> > >> >> >> >> >> >> >> >> -- >> >> Ista Zahn >> >> Graduate student >> >> University of Rochester >> >> Department of Clinical and Social Psychology >> >> http://yourpsyche.org >> > >> > >> >> >> >> -- >> Ista Zahn >> Graduate student >> University of Rochester >> Department of Clinical and Social Psychology >> http://yourpsyche.org > >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org