Hi All, I'm fiddling with an program to read a text file containing periods that SAS uses for missing values. I know that if I had the original SAS data set instead of a text file, R would handle this conversion for me. Data frames do not allow missing values in their indices but vectors do. Why is that? A search of the error message points out the problem and solution but not why they differ. A simplified program that demonstrates the issue is below. Thanks, Bob # Here's a data frame that has both periods and NAs. # I want sex to remain character for now. sex=c("m","f",".",NA) x=c(1,2,3,NA) myDF <- data.frame(sex,x,stringsAsFactors=F) rm(sex,x) myDF # Substituting NA into data frame does not work # due to NAs in the indices. The error message is: # missing values are not allowed in subscripted assignments of data frames myDF[ myDF$sex==".", "sex" ] <- NA myDF # This works because myDF$sex is a vector and vectors allow NAs in indexes. # Why don't data frames allow this? myDF$sex[ myDF$sex=="." ] <- NA myDF ========================================================Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: muenchen at utk.edu Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html
On Sun, 2 Sep 2007, Muenchen, Robert A (Bob) wrote:> Hi All, > > I'm fiddling with an program to read a text file containing periods that > SAS uses for missing values. I know that if I had the original SAS data > set instead of a text file, R would handle this conversion for me. > > Data frames do not allow missing values in their indices but vectors do. > Why is that? A search of the error message points out the problem and > solution but not why they differ. A simplified program that demonstrates > the issue is below. > > Thanks, > Bob > > # Here's a data frame that has both periods and NAs. > # I want sex to remain character for now. > > sex=c("m","f",".",NA) > x=c(1,2,3,NA) > myDF <- data.frame(sex,x,stringsAsFactors=F) > rm(sex,x) > myDF > > # Substituting NA into data frame does not work > # due to NAs in the indices. The error message is: > # missing values are not allowed in subscripted assignments of data > frames > > myDF[ myDF$sex==".", "sex" ] <- NA > myDF > > # This works because myDF$sex is a vector and vectors allow NAs in > indexes. > # Why don't data frames allow this? > > myDF$sex[ myDF$sex=="." ] <- NA > myDFR version 2.5.1 'allows' it.> df <- as.data.frame(diag(3)[,-1]) > df[ df[,1]==1 ] <- NA > dfbut the result may not be what you were expecting. See ?"[.data.frame" (esp. Details) for more info on why it does not 'work' as you expected. Also, since you mention a 'text file' I suggest you look at ?read.table or ?scan where you will see that dots.are.NA <- read.table("my.file", na.strings = '.' ) may help you. Chuck> > ========================================================> Bob Muenchen (pronounced Min'-chen), Manager > Statistical Consulting Center > U of TN Office of Information Technology > 200 Stokely Management Center, Knoxville, TN 37996-0520 > Voice: (865) 974-5230 > FAX: (865) 974-4810 > Email: muenchen at utk.edu > Web: http://oit.utk.edu/scc, > News: http://listserv.utk.edu/archives/statnews.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Muenchen, Robert A (Bob) wrote:> Hi All, > > I'm fiddling with an program to read a text file containing periods that > SAS uses for missing values. I know that if I had the original SAS data > set instead of a text file, R would handle this conversion for me. > > Data frames do not allow missing values in their indices but vectors do. > Why is that? A search of the error message points out the problem and > solution but not why they differ. A simplified program that demonstrates > the issue is below. > > Thanks, > Bob > > # Here's a data frame that has both periods and NAs. > # I want sex to remain character for now. > > sex=c("m","f",".",NA) > x=c(1,2,3,NA) > myDF <- data.frame(sex,x,stringsAsFactors=F) > rm(sex,x) > myDF > > # Substituting NA into data frame does not work > # due to NAs in the indices. The error message is: > # missing values are not allowed in subscripted assignments of data > frames > > myDF[ myDF$sex==".", "sex" ] <- NA > myDF >Hi Bob, What happens is that you don't get FALSE when you ask if something==NA, you get NA. However, if you use the "which" function, it cleans up the NAs for you and the result of that should do what you want. myDF[which(myDF$sex=="."),"sex"]<-NA Jim