Chel Hee Lee
2016-Jan-21 04:08 UTC
[R] strange answer when using 'aggregate()' with a formula
Could you kindly test the following codes? It is because I found strange answer when 'aggregate()' is used with a formula. I am trying to count how many missing data entries are in each group. For this exercise, I created data as below: > tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5)) > tmp grp y 1 2 NA 2 3 0.5 3 2 3.0 4 3 0.5 I see that observations (variable y) can be grouped into two groups (variable grp). For group 2, y has NA and 3.0. For group 3, y has 0.5 and 0.5. Hence, the number of missing values is 1 and 0 for group 2 and 3, respectively. This work can be done using 'aggregate()' in the 'stats' package as below: > aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x))) grp x 1 2 1 2 3 0 A formula can be used as below: > aggregate(y~grp, data=tmp, function(x) sum(is.na(x))) grp y 1 2 0 2 3 0 What a surprise! Is this a bug? I would appreciate if you share the results after testing the codes. Thank you so much for your helps in advance! Chel Hee Lee
Fox, John
2016-Jan-21 06:52 UTC
[R] strange answer when using 'aggregate()' with a formula
Dear Chel Hee Lee, With the formula method, the default na.action is na.omit; thus,> aggregate(y~grp, data=tmp, function(x) sum(is.na(x)), na.action=na.pass)grp y 1 2 1 2 3 0 I hope this helps, John ----------------------------- John Fox, Professor McMaster University Hamilton, Ontario Canada L8S 4M4 Web: socserv.mcmaster.ca/jfox> -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Chel Hee Lee > Sent: January 21, 2016 5:08 AM > To: R-help at r-project.org > Subject: [R] strange answer when using 'aggregate()' with a formula > > Could you kindly test the following codes? It is because I found strange answer > when 'aggregate()' is used with a formula. > > I am trying to count how many missing data entries are in each group. > For this exercise, I created data as below: > > > tmp <- data.frame(grp=c(2,3,2,3), y=c(NA, 0.5, 3, 0.5)) > tmp > grp y > 1 2 NA > 2 3 0.5 > 3 2 3.0 > 4 3 0.5 > > I see that observations (variable y) can be grouped into two groups (variable > grp). For group 2, y has NA and 3.0. For group 3, y has 0.5 and 0.5. Hence, the > number of missing values is 1 and 0 for group 2 and > 3, respectively. This work can be done using 'aggregate()' in the > 'stats' package as below: > > > aggregate(x=tmp$y, by=list(grp=tmp$grp), function(x) sum(is.na(x))) > grp x > 1 2 1 > 2 3 0 > > A formula can be used as below: > > > aggregate(y~grp, data=tmp, function(x) sum(is.na(x))) > grp y > 1 2 0 > 2 3 0 > > What a surprise! Is this a bug? I would appreciate if you share the > results after testing the codes. Thank you so much for your helps in > advance! > > Chel Hee Lee > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.