I need to understand how and why dcast() adds NAs to a data frame that contained no missing values. The database table of chemical concentrations has all missing values removed because they cannot contribute to data analyses. The structure of the R data frame of these data have no NA values, and neither does the data frame resulting from applying the reshape2 melt() function to it. However, the data frame produced by the dcast() function does contain NAs for all chemicals. I assume this is because of the syntax I used: chem.cast <- dcast(chem.melt, site + sampdate + era + ceneq1 + floor + ceiling ~ param) How should I reshape the data frame from long to wide without adding these spurious NAs? Rich
Can you provide a reproducible example? See, e.g., http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for information on how to do so. My entirely unjustified guess is that the NAs appear for combinations of factor levels that don't exist. Michael On Tue, Aug 7, 2012 at 4:45 PM, Rich Shepard <rshepard at appl-ecosys.com> wrote:> I need to understand how and why dcast() adds NAs to a data frame that > contained no missing values. > > The database table of chemical concentrations has all missing values > removed because they cannot contribute to data analyses. The structure of > the R data frame of these data have no NA values, and neither does the data > frame resulting from applying the reshape2 melt() function to it. However, > the data frame produced by the dcast() function does contain NAs for all > chemicals. I assume this is because of the syntax I used: > > chem.cast <- dcast(chem.melt, site + sampdate + era + ceneq1 + floor + > ceiling ~ param) > > How should I reshape the data frame from long to wide without adding these > spurious NAs? > > Rich > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Need sample data, and code that demos the problem. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. Rich Shepard <rshepard at appl-ecosys.com> wrote:> I need to understand how and why dcast() adds NAs to a data frame that >contained no missing values. > > The database table of chemical concentrations has all missing values >removed because they cannot contribute to data analyses. The structure >of >the R data frame of these data have no NA values, and neither does the >data >frame resulting from applying the reshape2 melt() function to it. >However, >the data frame produced by the dcast() function does contain NAs for >all >chemicals. I assume this is because of the syntax I used: > >chem.cast <- dcast(chem.melt, site + sampdate + era + ceneq1 + floor + >ceiling ~ param) > >How should I reshape the data frame from long to wide without adding >these >spurious NAs? > >Rich > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
HI, It is hard to tell without the data.? But, a wild guess is that your data might have more levels per each variable and so the missing combinations end up as NA. For example, Try these two example datasets. #####This will end up with NAs md2? <-? structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 7L, 8L, 8L), .Label = c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), class = "factor"), ? ? tps = structure(c(7L, 12L, 14L, 4L, 8L, 9L, 16L, 6L, 7L, ? ? 11L, 6L, 15L, 10L, 13L, 3L, 4L, 5L, 1L, 2L), .Label = c("A", ? ? "C", "D", "E", "G", "I", "L", "M", "N", "P", "Q", "R", "S", ? ? "T", "V", "Y"), class = "factor"), sum = c(0.914913196595112, ? ? 0.0367565080432513, 0.0483302953616366, 0.982727803634948, ? ? 0.0172721963650521, 0.0483302953616366, 0.951669704638363, ? ? 0.89764100023006, 0.0850868034048879, 0.0172721963650521, ? ? 0.951669704638363, 0.0483302953616366, 0.963243491956749, ? ? 0.0367565080432513, 0.89764100023006, 0.0540287044083034, ? ? 0.0483302953616366, 0.982727803634948, 0.0172721963650521 ? ? )), .Names = c("group", "tps", "sum"), row.names = c(NA, -19L), class = "data.frame") dcast(md2,? group ~ tps , value.vars? = "sum") ##### with no NAs. md4<-data.frame(group=c(rep("X1",3),rep("X2",3)),tps=c("L","R","P","L","R","P"),sum=rnorm(6,15)) dd? <-? dcast(md4, group~tps, value.var="sum") ? A.K. ----- Original Message ----- From: Rich Shepard <rshepard at appl-ecosys.com> To: r-help at r-project.org Cc: Sent: Tuesday, August 7, 2012 5:45 PM Subject: [R] reshape2's dcast() Adds NAs to Data Frame ? I need to understand how and why dcast() adds NAs to a data frame that contained no missing values. ? The database table of chemical concentrations has all missing values removed because they cannot contribute to data analyses. The structure of the R data frame of these data have no NA values, and neither does the data frame resulting from applying the reshape2 melt() function to it. However, the data frame produced by the dcast() function does contain NAs for all chemicals. I assume this is because of the syntax I used: chem.cast <- dcast(chem.melt, site + sampdate + era + ceneq1 + floor + ceiling ~ param) ? How should I reshape the data frame from long to wide without adding these spurious NAs? Rich ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.