Farley, Robert
2009-Jun-03 00:03 UTC
[R] Still can't find missing data - How do I get NA in xtabs with factors?
The problem here is Table doesn't seem to have a way to weigh the data.> ToyDataData1 Data2 Data3 Weight 101 Sam Red Banana 1.1 102 Sam Green Banana 2.1 103 Sam Blue Orange 2.1 104 Fred Red Orange 2.1 105 Fred Green Guava 2.1 106 Fred Blue Guava 2.1 107 <NA> Red Pear 50.1 108 <NA> Green Pear 50.1 109 <NA> Blue <NA> 1000.2> with(ToyData,table(Data1, Data3, useNA = "ifany"))Data3 Data1 Banana Guava Orange Pear <NA> Fred 0 2 1 0 0 Sam 2 0 1 0 0 <NA> 0 0 0 2 1> xtabs(Weight ~ Data1 + Data3, exclude=NULL, na.action=na.pass, ToyData)Data3 Data1 Banana Guava Orange Pear Fred 0.0 4.2 2.1 0.0 Sam 3.2 0.0 2.1 0.0 Robert Farley Metro www.Metro.net -----Original Message----- From: 3.14david at gmail.com [mailto:3.14david at gmail.com] Sent: Sunday, May 31, 2009 14:27 To: Farley, Robert Subject: Re: Still can't find missing data - How do I get NA in xtabs with factors? you might want to try 'table' - with the exclude option -rather than 'xtabs': with(data,table(a, b, exclude="NULL")) I *think* that the problem is that xtabs excludes NAs before it makes factors from the values david freedman Farley, Robert wrote:> > Let's see if I understand this. Do I iterate through > x <- factor(x, levels(c(levels(x), NA), exclude=NULL) > for each of the few hundred variables (x) in my data frame? > > > I tried to do this all at once and failed: >> ToyData > Data1 Data2 Data3 Weight > 101 Sam Red Banana 1.1 > 102 Sam Green Banana 2.1 > 103 Sam Blue Orange 2.1 > 104 Fred Red Orange 2.1 > 105 Fred Green Guava 2.1 > 106 Fred Blue Guava 2.1 > 107 <NA> Red Pear 50.1 > 108 <NA> Green Pear 50.1 > 109 <NA> Blue <NA> 1000.2 >> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, >> na.action=na.pass)) > Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action > na.pass) : > unused argument(s) (exclude = NULL, na.action = function (object, ...) >> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA))) >> ToyData > Data1 Data2 Data3 Weight > <NA> <NA> <NA> <NA> > Levels: >> > But it didn't work. Don't I need to do this separately for each variable? > > > > Is there a way to get read.spss to insert "NA" levels for each variable > when I create the data frame? Is this because SPSS (and STATA) allow "NA" > as an "undeclared level" and R does not? > > > Will this be a problem with read.dta as well? > > > > > Robert Farley > Metro > www.Metro.net > > > -----Original Message----- > From: William Dunlap [mailto:wdunlap at tibco.com] > Sent: Thursday, May 28, 2009 20:39 > To: Farley, Robert > Subject: RE: [R] Still can't find missing data > > In R factors don't save space over character vectors - only > one copy of any given string is kept in memory in either case. > Factors do let you order the levels in the way you want and > that is often important in presentations. > > You can add NA to the list of levels of a factor by doing > x <- factor(x, levels(c(levels(x), NA), exclude=NULL) > where 'x' represents each factor in your dataset. After > doing that is.na(x) will be all FALSE and you may not > want that for other situations. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert >> Sent: Thursday, May 28, 2009 5:27 PM >> To: R-help >> Subject: Re: [R] Still can't find missing data >> >> That seems to work for the toy data. How do I implement this >> change with my real data, which are read from very large >> Stata and SPSS files and keep the factor definitions? Won't >> I be losing information (and creating a larger dataset) by >> not using the factor levels? >> >> >> How do I recover the factor values? I read my datafile >> (read.spss using use.value.labels = FALSE,) and got this: >> >> connector >> Mode_orig_only 1 9 >> 1 17.814338 0.000000 >> 3 49.128982 0.000000 >> 4 525.978899 0.000000 >> 5 913.295370 0.000000 >> 6 114.302764 0.000000 >> 7 298.151438 0.000000 >> 8 93.088049 0.000000 >> 9 233.794168 0.000000 >> 10 20.764539 0.000000 >> 11 424.120506 0.000000 >> 12 8.054528 0.000000 >> 13 6.010790 0.000000 >> 14 1832.748525 0.000000 >> 15 10191.284139 0.000000 >> 16 2099.771923 0.000000 >> 17 1630.148576 0.000000 >> <NA> 0.000000 9491.013249 >> >> which does have the "NA" row, but not the factor labels. If >> I read the file with use.value.labels=TRUE I can see what I'm >> summarizing, but not the NAs. Can't I have both? >> >> The top summary will also omit all 0 value factors (of >> course) in the variable summarized. >> >> >> The same summary using factors: >> connector >> >> Mode_orig_only >> OD Passenger Connector >> >> Walked/Biked >> 17.814338 0.000000 >> >> I flew in from another a place/connected >> 0.000000 0.000000 >> >> Amtrak >> 49.128982 0.000000 >> >> Bus - Chartered bus or van >> 525.978899 0.000000 >> >> Bus - Hotel Courtesy van >> 913.295370 0.000000 >> >> Bus - MTA (Metro) or other public transit bus >> 114.302764 0.000000 >> >> Bus - Scheduled airport bus or van (e.g. Airport bus or >> Disn 298.151438 0.000000 >> >> Bus - Union Station Flyaway >> 93.088049 0.000000 >> >> Bus - Van Nuys Flyaway >> 233.794168 0.000000 >> >> Green line/light rail >> 20.764539 0.000000 >> >> Limousine/town car >> 424.120506 0.000000 >> >> Metrolink >> 8.054528 0.000000 >> >> Motorcycle >> 6.010790 0.000000 >> >> On-call shuttle/van (e.g. Super Shuttle, Prime Time) >> 1832.748525 0.000000 >> >> Car/truck/van - Private >> 10191.284139 0.000000 >> >> Car/truck/van - Rental >> 2099.771923 0.000000 >> >> Taxi >> 1630.148576 0.000000 >> >> ..Refused >> 0.000000 0.000000 >> >> >> >> >> >> >> >> Robert Farley >> Metro >> www.Metro.net >> >> >> -----Original Message----- >> From: William Dunlap [mailto:wdunlap at tibco.com] >> Sent: Thursday, May 28, 2009 16:26 >> To: Farley, Robert >> Subject: RE: [R] Still can't find missing data >> >> Try reading it in with read.table's argument stringsAsFactors=FALSE. >> >> I think the underlying problem is that exclude= is used only if >> the classifying variables are not already factors. I haven't studied >> the help file well enough to see if that is what is is documented >> to do, but it seems misleading. >> >> Bill Dunlap >> TIBCO Software Inc - Spotfire Division >> wdunlap tibco.com >> >> > -----Original Message----- >> > From: r-help-bounces at r-project.org >> > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert >> > Sent: Thursday, May 28, 2009 4:10 PM >> > To: R-help >> > Subject: Re: [R] Still can't find missing data >> > >> > In this toy data, each of the tables should sum to 1111 >> > None of the tables shows NA columns or rows. >> > >> > >> > > ################################ >> > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE, >> > sep=",", na.strings="NA", dec=".", row.names="ID_Num") >> > > ToyData >> > Data1 Data2 Data3 Weight >> > 101 Sam Red Banana 1 >> > 102 Sam Green Banana 2 >> > 103 Sam Blue Orange 2 >> > 104 Fred Red Orange 2 >> > 105 Fred Green Guava 2 >> > 106 Fred Blue Guava 2 >> > 107 <NA> Red Pear 50 >> > 108 <NA> Green Pear 50 >> > 109 <NA> Blue <NA> 1000 >> > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, >> > na.action=na.pass, ToyData) >> > Data2 >> > Data1 Blue Green Red >> > Fred 2 2 2 >> > Sam 2 2 1 >> > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, >> > na.action=na.pass,drop.unused.levels = FALSE, ToyData) >> > Data2 >> > Data1 Blue Green Red >> > Fred 2 2 2 >> > Sam 2 2 1 >> > > xtabs(Weight ~ Data1 + Data3, exclude=NULL, >> > na.action=na.pass,drop.unused.levels = FALSE, ToyData) >> > Data3 >> > Data1 Banana Guava Orange Pear >> > Fred 0 4 2 0 >> > Sam 3 0 2 0 >> > > >> > >> > >> > >> > >> > >> > Robert Farley >> > Metro >> > www.Metro.net >> > >> > >> > -----Original Message----- >> > From: r-help-bounces at r-project.org >> > [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne >> > Sent: Thursday, May 28, 2009 05:46 >> > To: r-help at r-project.org >> > Subject: Re: [R] Still can't find missing data >> > >> > >> > >> > >> > Farley, Robert wrote: >> > > >> > > I can't get the syntax that will allow me to show NA values >> > (rows) in the >> > > xtabs. >> > > >> > > lengthy non-reproducible example removed >> > > >> > >> > If you want a reproducible answer, prepare a reproducible >> > result. And check >> > that the >> > syntax is >> > >> > na.action=na.pass >> > >> > Dieter >> > >> > >> > >> > >> > -- >> > View this message in context: >> > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306 >> > 27p23761006.html >> > Sent from the R help mailing list archive at Nabble.com. >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >Quoted from: http://www.nabble.com/Still-can%27t-find-missing-data-tp23730627p23784989.html
Rolf Turner
2009-Jun-03 00:39 UTC
[R] Still can't find missing data - How do I get NA in xtabs with factors?
On 3/06/2009, at 12:03 PM, Farley, Robert wrote:> The problem here is Table doesn't seem to have a way to weigh the > data. > >> ToyData > Data1 Data2 Data3 Weight > 101 Sam Red Banana 1.1 > 102 Sam Green Banana 2.1 > 103 Sam Blue Orange 2.1 > 104 Fred Red Orange 2.1 > 105 Fred Green Guava 2.1 > 106 Fred Blue Guava 2.1 > 107 <NA> Red Pear 50.1 > 108 <NA> Green Pear 50.1 > 109 <NA> Blue <NA> 1000.2 >> with(ToyData,table(Data1, Data3, useNA = "ifany")) > Data3 > Data1 Banana Guava Orange Pear <NA> > Fred 0 2 1 0 0 > Sam 2 0 1 0 0 > <NA> 0 0 0 2 1 >> xtabs(Weight ~ Data1 + Data3, exclude=NULL, na.action=na.pass, >> ToyData) > Data3 > Data1 Banana Guava Orange Pear > Fred 0.0 4.2 2.1 0.0 Data3 > Data1 Banana Guava Orange Pear NA > Fred 0.0 4.2 2.1 0.0 0.0 > Sam 3.2 0.0 2.1 0.0 0.0 > NA 0.0 0.0 0.0 100.2 1000.2 > Sam 3.2 0.0 2.1 0.0Why don't you just re-code your data replacing missing values (<NA>) in your factors by the literal string "NA"? E.g.: revamp <- function(x){ if(!is.factor(x)) return(x) l <- levels(x) x <- as.character(x) x[is.na(x)] <- "NA" factor(x,levels=c(l,"NA")) } xxx <- as.data.frame(lapply(Toydata,revamp)) xtabs(Weight ~ Data1 + Data3, data=xxx) Data3 Data1 Banana Guava Orange Pear NA Fred 0.0 4.2 2.1 0.0 0.0 Sam 3.2 0.0 2.1 0.0 0.0 NA 0.0 0.0 0.0 100.2 1000.2 cheers, Rolf Turner ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}