I want to use table() to show NA values with factor variables. Using the set up from the help page, I have:> b <- factor(rep(c("A","B","C"), 10)) > d <- factor(rep(c("A","B","C"), 10), levels=c("A","B","C","D","E")) > is.na(d) <- 3:4 > table(b, d)d b A B C D E A 9 0 0 0 0 B 0 10 0 0 0 C 0 0 9 0 0>All of which is fine. But how can I get table() --- or some other function --- to include the observations which are NA for d? This does not do what I want (although I can see how it does what it is documented to do).> table(b, d, exclude = NULL)d b A B C D E A 9 0 0 0 0 B 0 10 0 0 0 C 0 0 9 0 0>Note that this dilemma only arises with factor variables. With numeric variables, things work differently.> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1); table(a, b)b a 1 2 1 1 1 2 2 0 3 1 0> table(a, b, exclude = NULL)b a 1 2 1 1 1 2 2 0 3 1 0 <NA> 1 0>How can I get similar behavior with factor variables? Thanks, Dave Kane> R.version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 5.0 year 2007 month 04 day 23 svn rev 41293 language R version.string R version 2.5.0 (2007-04-23)>
Hi problem is with your extra empty levels in your d factor. Without it> d1<-factor(d, exclude=NULL) > d1[1] A B <NA> <NA> B C A B C A B C A B C A B C A B C A B C A B C A B C Levels: A B C <NA>> table(b,d1)d1 b A B C <NA> A 9 0 0 1 B 0 10 0 0 C 0 0 9 1 regards Petr r-help-bounces at stat.math.ethz.ch napsal dne 23.05.2007 17:39:49:> I want to use table() to show NA values with factor variables. Using > the set up from the help page, I have: > > > b <- factor(rep(c("A","B","C"), 10)) > > d <- factor(rep(c("A","B","C"), 10), levels=c("A","B","C","D","E")) > > is.na(d) <- 3:4 > > table(b, d) > d > b A B C D E > A 9 0 0 0 0 > B 0 10 0 0 0 > C 0 0 9 0 0 > > > > All of which is fine. But how can I get table() --- or some other > function --- to include the observations which are NA for d? This does > not do what I want (although I can see how it does what it is > documented to do). > > > table(b, d, exclude = NULL) > d > b A B C D E > A 9 0 0 0 0 > B 0 10 0 0 0 > C 0 0 9 0 0 > > > > Note that this dilemma only arises with factor variables. With numeric > variables, things work differently. > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1); table(a, b) > b > a 1 2 > 1 1 1 > 2 2 0 > 3 1 0 > > table(a, b, exclude = NULL) > b > a 1 2 > 1 1 1 > 2 2 0 > 3 1 0 > <NA> 1 0 > > > > How can I get similar behavior with factor variables? > > > Thanks, > > Dave Kane > > > R.version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 5.0 > year 2007 > month 04 > day 23 > svn rev 41293 > language R > version.string R version 2.5.0 (2007-04-23) > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Rephrasing David Kane's example> b <- c(1,1,1,1,1, NA, 2,2,2,2) > d <- factor(c(rep(c("A","B","C"), 3), NA)) > table(b, d, exclude=NULL)d b A B C 1 2 2 1 2 1 1 1 <NA> 0 0 1 Why are only 9 observations instead of 10 listed in the table? This is a long-standing bug in Splus and R. Peter Dalgaar suggests recoding the factor variable so that "NA" is a level, rather than a "missing". This works, but it does not address the bug: for most of my factor variables I want missing to be missing so that omission works as expected in modeling. The exclude argument in table() should do what it says it does, which is to list ALL data in the table when exclude=NULL. At Mayo, we have replaced the table command to work around this (in place for 5+ years now). It has two additions: a method for factors that correctly propogates the exclude argument, and a change to exclude=NULL as the default. Table() is used, 99% of the time, to look at data on screen, and the number of missing is often the first question I'm asking; so we found the default to be, shall we say, non-intuitive. We argued these points with Insightful many years ago and got nowhere, the replys being a mix of a) it's not really broken and b) if we change it it might break something. We had not carried the argument forward to the R community, and just fix it ourselves. The revised version just works better day to day. In R, the manual page has been revised to state that the exclude argument is something different for factors, so I expect to remain in the minority. (I can't think of a time I would ever have wanted the actions of the new version of exclude, which for factors is a means only to exclude more things, rather than the usual use of keeping more in the table). Terry Therneau