I'm trying to prepare some cross tabs, looking at a number of variables against a variable "connector" which has 2 values: "OD Passenger" and " Connector". When I produce a xtabs one way I have observations under "Connector" but against a different variable "Connector" shows all 0 values. What is wrong? I've looked into the na commands and the ?xtabs entry, but I haven't found anything that works. #########################> > XTTable <- xtabs(wt_annual ~ time_strata + connector, LAWAData) > XTTable ## Listconnector time_strata OD Passenger Connector Morning Peak - 0550 to 0959 5605.140 1234.933 Late morning to Mid-Day - 1000 to 1359 4778.503 2516.943 Evening Peak - 1400 to 1959 5145.730 3171.348 Night - 2000 to 0235 (last flight) 2929.085 2567.790> > XTTable <- xtabs(wt_annual ~ Mode_orig_only + connector, exclude=NULL, LAWAData) > XTTable ## Listconnector Mode_orig_only OD Passenger Connector Walked/Biked 17.814338 0.000000 I flew in from another a place/connected 0.000000 0.000000 Amtrak 49.128982 0.000000 Bus - Chartered bus or van 525.978899 0.000000 Bus - Hotel Courtesy van 913.295370 0.000000 Bus - MTA (Metro) or other public transit bus 114.302764 0.000000 Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000 Bus - Union Station Flyaway 93.088049 0.000000 Bus - Van Nuys Flyaway 233.794168 0.000000 Green line/light rail 20.764539 0.000000 Limousine/town car 424.120506 0.000000 Metrolink 8.054528 0.000000 Motorcycle 6.010790 0.000000 On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000 Car/truck/van - Private 10191.284139 0.000000 Car/truck/van - Rental 2099.771923 0.000000 Taxi 1630.148576 0.000000 ..Refused 0.000000 0.000000> > > XTTable <- xtabs(wt_annual ~ Mode_orig_only + connector, na.action(na.pass), LAWAData)Error in eval(expr, envir, enclos) : object "wt_annual" not found> XTTable ## Listconnector Mode_orig_only OD Passenger Connector Walked/Biked 17.814338 0.000000 I flew in from another a place/connected 0.000000 0.000000 Amtrak 49.128982 0.000000 Bus - Chartered bus or van 525.978899 0.000000 Bus - Hotel Courtesy van 913.295370 0.000000 Bus - MTA (Metro) or other public transit bus 114.302764 0.000000 Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000 Bus - Union Station Flyaway 93.088049 0.000000 Bus - Van Nuys Flyaway 233.794168 0.000000 Green line/light rail 20.764539 0.000000 Limousine/town car 424.120506 0.000000 Metrolink 8.054528 0.000000 Motorcycle 6.010790 0.000000 On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000 Car/truck/van - Private 10191.284139 0.000000 Car/truck/van - Rental 2099.771923 0.000000 Taxi 1630.148576 0.000000 ..Refused 0.000000 0.000000> > > XTTable <- xtabs(wt_annual ~ Mode_orig_only + connector, drop.unused.levels = FALSE, LAWAData) > XTTable ## Listconnector Mode_orig_only OD Passenger Connector Walked/Biked 17.814338 0.000000 I flew in from another a place/connected 0.000000 0.000000 Amtrak 49.128982 0.000000 Bus - Chartered bus or van 525.978899 0.000000 Bus - Hotel Courtesy van 913.295370 0.000000 Bus - MTA (Metro) or other public transit bus 114.302764 0.000000 Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000 Bus - Union Station Flyaway 93.088049 0.000000 Bus - Van Nuys Flyaway 233.794168 0.000000 Green line/light rail 20.764539 0.000000 Limousine/town car 424.120506 0.000000 Metrolink 8.054528 0.000000 Motorcycle 6.010790 0.000000 On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000 Car/truck/van - Private 10191.284139 0.000000 Car/truck/van - Rental 2099.771923 0.000000 Taxi 1630.148576 0.000000 ..Refused 0.000000 0.000000> >######################## Robert Farley Metro 1 Gateway Plaza Mail Stop 99-23-7 Los Angeles, CA 90012-2952 Voice: (213)922-2532 Fax: (213)922-2868 www.Metro.net [[alternative HTML version deleted]]
Farley, Robert <FarleyR <at> metro.net> writes:> > What is wrong? I've looked into the na commands and the ?xtabs entry, but Ihaven't found anything that works.>I never understood the logic that exclude=NULL needs na.action in addition. test <- c(1,2,3,1,2,3,NA,NA,1,2,3) xtabs(~test,exclude=NULL,na.action=na.pass) Dieter
That seems to work for the toy data. How do I implement this change with my real data, which are read from very large Stata and SPSS files and keep the factor definitions? Won't I be losing information (and creating a larger dataset) by not using the factor levels? How do I recover the factor values? I read my datafile (read.spss using use.value.labels = FALSE,) and got this: connector Mode_orig_only 1 9 1 17.814338 0.000000 3 49.128982 0.000000 4 525.978899 0.000000 5 913.295370 0.000000 6 114.302764 0.000000 7 298.151438 0.000000 8 93.088049 0.000000 9 233.794168 0.000000 10 20.764539 0.000000 11 424.120506 0.000000 12 8.054528 0.000000 13 6.010790 0.000000 14 1832.748525 0.000000 15 10191.284139 0.000000 16 2099.771923 0.000000 17 1630.148576 0.000000 <NA> 0.000000 9491.013249 which does have the "NA" row, but not the factor labels. If I read the file with use.value.labels=TRUE I can see what I'm summarizing, but not the NAs. Can't I have both? The top summary will also omit all 0 value factors (of course) in the variable summarized. The same summary using factors: connector Mode_orig_only OD Passenger Connector Walked/Biked 17.814338 0.000000 I flew in from another a place/connected 0.000000 0.000000 Amtrak 49.128982 0.000000 Bus - Chartered bus or van 525.978899 0.000000 Bus - Hotel Courtesy van 913.295370 0.000000 Bus - MTA (Metro) or other public transit bus 114.302764 0.000000 Bus - Scheduled airport bus or van (e.g. Airport bus or Disn 298.151438 0.000000 Bus - Union Station Flyaway 93.088049 0.000000 Bus - Van Nuys Flyaway 233.794168 0.000000 Green line/light rail 20.764539 0.000000 Limousine/town car 424.120506 0.000000 Metrolink 8.054528 0.000000 Motorcycle 6.010790 0.000000 On-call shuttle/van (e.g. Super Shuttle, Prime Time) 1832.748525 0.000000 Car/truck/van - Private 10191.284139 0.000000 Car/truck/van - Rental 2099.771923 0.000000 Taxi 1630.148576 0.000000 ..Refused 0.000000 0.000000 Robert Farley Metro www.Metro.net -----Original Message----- From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Thursday, May 28, 2009 16:26 To: Farley, Robert Subject: RE: [R] Still can't find missing data Try reading it in with read.table's argument stringsAsFactors=FALSE. I think the underlying problem is that exclude= is used only if the classifying variables are not already factors. I haven't studied the help file well enough to see if that is what is is documented to do, but it seems misleading. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert > Sent: Thursday, May 28, 2009 4:10 PM > To: R-help > Subject: Re: [R] Still can't find missing data > > In this toy data, each of the tables should sum to 1111 > None of the tables shows NA columns or rows. > > > > ################################ > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE, > sep=",", na.strings="NA", dec=".", row.names="ID_Num") > > ToyData > Data1 Data2 Data3 Weight > 101 Sam Red Banana 1 > 102 Sam Green Banana 2 > 103 Sam Blue Orange 2 > 104 Fred Red Orange 2 > 105 Fred Green Guava 2 > 106 Fred Blue Guava 2 > 107 <NA> Red Pear 50 > 108 <NA> Green Pear 50 > 109 <NA> Blue <NA> 1000 > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > na.action=na.pass, ToyData) > Data2 > Data1 Blue Green Red > Fred 2 2 2 > Sam 2 2 1 > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > Data2 > Data1 Blue Green Red > Fred 2 2 2 > Sam 2 2 1 > > xtabs(Weight ~ Data1 + Data3, exclude=NULL, > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > Data3 > Data1 Banana Guava Orange Pear > Fred 0 4 2 0 > Sam 3 0 2 0 > > > > > > > > Robert Farley > Metro > www.Metro.net > > > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne > Sent: Thursday, May 28, 2009 05:46 > To: r-help at r-project.org > Subject: Re: [R] Still can't find missing data > > > > > Farley, Robert wrote: > > > > I can't get the syntax that will allow me to show NA values > (rows) in the > > xtabs. > > > > lengthy non-reproducible example removed > > > > If you want a reproducible answer, prepare a reproducible > result. And check > that the > syntax is > > na.action=na.pass > > Dieter > > > > > -- > View this message in context: > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306 > 27p23761006.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Farley, Robert
2009-May-29 18:14 UTC
[R] Still can't find missing data - How do I get NA in xtabs with factors?
Let's see if I understand this. Do I iterate through x <- factor(x, levels(c(levels(x), NA), exclude=NULL) for each of the few hundred variables (x) in my data frame? I tried to do this all at once and failed:> ToyDataData1 Data2 Data3 Weight 101 Sam Red Banana 1.1 102 Sam Green Banana 2.1 103 Sam Blue Orange 2.1 104 Fred Red Orange 2.1 105 Fred Green Guava 2.1 106 Fred Blue Guava 2.1 107 <NA> Red Pear 50.1 108 <NA> Green Pear 50.1 109 <NA> Blue <NA> 1000.2> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA), exclude=NULL, na.action=na.pass))Error in levels(c(levels(ToyData), NA), exclude = NULL, na.action = na.pass) : unused argument(s) (exclude = NULL, na.action = function (object, ...)> ToyData <- factor(ToyData, levels(c(levels(ToyData), NA))) > ToyDataData1 Data2 Data3 Weight <NA> <NA> <NA> <NA> Levels:>But it didn't work. Don't I need to do this separately for each variable? Is there a way to get read.spss to insert "NA" levels for each variable when I create the data frame? Is this because SPSS (and STATA) allow "NA" as an "undeclared level" and R does not? Will this be a problem with read.dta as well? Robert Farley Metro www.Metro.net -----Original Message----- From: William Dunlap [mailto:wdunlap at tibco.com] Sent: Thursday, May 28, 2009 20:39 To: Farley, Robert Subject: RE: [R] Still can't find missing data In R factors don't save space over character vectors - only one copy of any given string is kept in memory in either case. Factors do let you order the levels in the way you want and that is often important in presentations. You can add NA to the list of levels of a factor by doing x <- factor(x, levels(c(levels(x), NA), exclude=NULL) where 'x' represents each factor in your dataset. After doing that is.na(x) will be all FALSE and you may not want that for other situations. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert > Sent: Thursday, May 28, 2009 5:27 PM > To: R-help > Subject: Re: [R] Still can't find missing data > > That seems to work for the toy data. How do I implement this > change with my real data, which are read from very large > Stata and SPSS files and keep the factor definitions? Won't > I be losing information (and creating a larger dataset) by > not using the factor levels? > > > How do I recover the factor values? I read my datafile > (read.spss using use.value.labels = FALSE,) and got this: > > connector > Mode_orig_only 1 9 > 1 17.814338 0.000000 > 3 49.128982 0.000000 > 4 525.978899 0.000000 > 5 913.295370 0.000000 > 6 114.302764 0.000000 > 7 298.151438 0.000000 > 8 93.088049 0.000000 > 9 233.794168 0.000000 > 10 20.764539 0.000000 > 11 424.120506 0.000000 > 12 8.054528 0.000000 > 13 6.010790 0.000000 > 14 1832.748525 0.000000 > 15 10191.284139 0.000000 > 16 2099.771923 0.000000 > 17 1630.148576 0.000000 > <NA> 0.000000 9491.013249 > > which does have the "NA" row, but not the factor labels. If > I read the file with use.value.labels=TRUE I can see what I'm > summarizing, but not the NAs. Can't I have both? > > The top summary will also omit all 0 value factors (of > course) in the variable summarized. > > > The same summary using factors: > connector > > Mode_orig_only > OD Passenger Connector > > Walked/Biked > 17.814338 0.000000 > > I flew in from another a place/connected > 0.000000 0.000000 > > Amtrak > 49.128982 0.000000 > > Bus - Chartered bus or van > 525.978899 0.000000 > > Bus - Hotel Courtesy van > 913.295370 0.000000 > > Bus - MTA (Metro) or other public transit bus > 114.302764 0.000000 > > Bus - Scheduled airport bus or van (e.g. Airport bus or > Disn 298.151438 0.000000 > > Bus - Union Station Flyaway > 93.088049 0.000000 > > Bus - Van Nuys Flyaway > 233.794168 0.000000 > > Green line/light rail > 20.764539 0.000000 > > Limousine/town car > 424.120506 0.000000 > > Metrolink > 8.054528 0.000000 > > Motorcycle > 6.010790 0.000000 > > On-call shuttle/van (e.g. Super Shuttle, Prime Time) > 1832.748525 0.000000 > > Car/truck/van - Private > 10191.284139 0.000000 > > Car/truck/van - Rental > 2099.771923 0.000000 > > Taxi > 1630.148576 0.000000 > > ..Refused > 0.000000 0.000000 > > > > > > > > Robert Farley > Metro > www.Metro.net > > > -----Original Message----- > From: William Dunlap [mailto:wdunlap at tibco.com] > Sent: Thursday, May 28, 2009 16:26 > To: Farley, Robert > Subject: RE: [R] Still can't find missing data > > Try reading it in with read.table's argument stringsAsFactors=FALSE. > > I think the underlying problem is that exclude= is used only if > the classifying variables are not already factors. I haven't studied > the help file well enough to see if that is what is is documented > to do, but it seems misleading. > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > > -----Original Message----- > > From: r-help-bounces at r-project.org > > [mailto:r-help-bounces at r-project.org] On Behalf Of Farley, Robert > > Sent: Thursday, May 28, 2009 4:10 PM > > To: R-help > > Subject: Re: [R] Still can't find missing data > > > > In this toy data, each of the tables should sum to 1111 > > None of the tables shows NA columns or rows. > > > > > > > ################################ > > > ToyData <- read.table("C:/Data/R/Toy.csv", header=TRUE, > > sep=",", na.strings="NA", dec=".", row.names="ID_Num") > > > ToyData > > Data1 Data2 Data3 Weight > > 101 Sam Red Banana 1 > > 102 Sam Green Banana 2 > > 103 Sam Blue Orange 2 > > 104 Fred Red Orange 2 > > 105 Fred Green Guava 2 > > 106 Fred Blue Guava 2 > > 107 <NA> Red Pear 50 > > 108 <NA> Green Pear 50 > > 109 <NA> Blue <NA> 1000 > > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > > na.action=na.pass, ToyData) > > Data2 > > Data1 Blue Green Red > > Fred 2 2 2 > > Sam 2 2 1 > > > xtabs(Weight ~ Data1 + Data2, exclude=NULL, > > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > > Data2 > > Data1 Blue Green Red > > Fred 2 2 2 > > Sam 2 2 1 > > > xtabs(Weight ~ Data1 + Data3, exclude=NULL, > > na.action=na.pass,drop.unused.levels = FALSE, ToyData) > > Data3 > > Data1 Banana Guava Orange Pear > > Fred 0 4 2 0 > > Sam 3 0 2 0 > > > > > > > > > > > > > > > Robert Farley > > Metro > > www.Metro.net > > > > > > -----Original Message----- > > From: r-help-bounces at r-project.org > > [mailto:r-help-bounces at r-project.org] On Behalf Of Dieter Menne > > Sent: Thursday, May 28, 2009 05:46 > > To: r-help at r-project.org > > Subject: Re: [R] Still can't find missing data > > > > > > > > > > Farley, Robert wrote: > > > > > > I can't get the syntax that will allow me to show NA values > > (rows) in the > > > xtabs. > > > > > > lengthy non-reproducible example removed > > > > > > > If you want a reproducible answer, prepare a reproducible > > result. And check > > that the > > syntax is > > > > na.action=na.pass > > > > Dieter > > > > > > > > > > -- > > View this message in context: > > http://www.nabble.com/Still-can%27t-find-missing-data-tp237306 > > 27p23761006.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >