Julie Shoemaker
2012-Jul-19 18:05 UTC
[R] problem replacing NA's in a dataset (10% remain after removal attempt)
Hi all, I'm attempting to gap-fill a dataset, replacing the missing values with each month's day or night median value. The problem is that my code results in some, but not all the NA's being replaced and I cannot figure out how this is possible. When I look at the individual line's where the NA's remain, they should have been captured in my code as far as I can tell. Here is an example: the dataset is 4464x14 called hourly.data I've already replaced all NaN values with NA #filPFD is a column of ambient light levels, it has no NA values, all values are real and either 0 or >0 #month is a column with values between 7 and 12 depending on the month the data was collected #fillCH4 is a column containing CH4 flux data that I am trying to gap-fill #night_median and day_median are 1x6 vectors with the median flux values for each month temp<-hourly.data[hourly.data$month==7,] darkmonth<-(temp$filPFD==0) daymonth<-(temp$filPFD>0) temp[is.na(temp[darkmonth,"fillCH4"]),"fillCH4"]<-night_median[1] temp[is.na(temp[daymonth,"fillCH4"]),"fillCH4"]<-day_median[1] hourly.data[hourly.data$month==7,"fillCH4"]<-temp$fillCH4 This code replaces the majority of the NA's, but maybe 10% remain. The cases that I have isolated, all have values of 7 for the "month" column and real values in the "filPFD" column. Any thoughts? Am I missing something obvious? Is there any way these values could be coming up as NA but belong to some different classification such that they are not picked up by the is.na function? Best, Julie __________________________________ Julie Shoemaker, PhD Postdoctoral Research Associate Harvard University phone: (617) 384-7237 email: jshoemak at fas.harvard.edu
Peter Ehlers
2012-Jul-19 20:17 UTC
[R] problem replacing NA's in a dataset (10% remain after removal attempt)
On 2012-07-19 11:05, Julie Shoemaker wrote:> Hi all, > I'm attempting to gap-fill a dataset, replacing the missing values with > each month's day or night median value. > > The problem is that my code results in some, but not all the NA's being > replaced and I cannot figure out how this is possible. When I look at > the individual line's where the NA's remain, they should have been > captured in my code as far as I can tell. Here is an example: > > the dataset is 4464x14 called hourly.data > I've already replaced all NaN values with NA > > #filPFD is a column of ambient light levels, it has no NA values, all > values are real and either 0 or >0 > #month is a column with values between 7 and 12 depending on the month > the data was collected > #fillCH4 is a column containing CH4 flux data that I am trying to gap-fill > #night_median and day_median are 1x6 vectors with the median flux values > for each month > > temp<-hourly.data[hourly.data$month==7,] > darkmonth<-(temp$filPFD==0) > daymonth<-(temp$filPFD>0) > temp[is.na(temp[darkmonth,"fillCH4"]),"fillCH4"]<-night_median[1] > temp[is.na(temp[daymonth,"fillCH4"]),"fillCH4"]<-day_median[1] > hourly.data[hourly.data$month==7,"fillCH4"]<-temp$fillCH4 > > > This code replaces the majority of the NA's, but maybe 10% remain. The > cases that I have isolated, all have values of 7 for the "month" column > and real values in the "filPFD" column. > > Any thoughts? Am I missing something obvious? Is there any way these > values could be coming up as NA but belong to some different > classification such that they are not picked up by the is.na function?The most obvious thing you're missing is to provide str(hourly.data). Indeed, executing that may well lead yourself to the answer. Peter Ehlers> > Best, > Julie > > __________________________________ > Julie Shoemaker, PhD > Postdoctoral Research Associate > Harvard University > phone: (617) 384-7237 > email: jshoemak at fas.harvard.edu > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Possibly Parallel Threads
- 3.2.2 : ld: fatal: relocations remain against allocatable but non-writable sections
- IRC question: Will Perl [..] remain a binding but not a requirement?
- Would RHEL, CentOS, and Fedora Remain Open Source/Free Software After IBM Buys Red Hat for $34 Billion?
- Would RHEL, CentOS, and Fedora Remain Open Source/Free Software After IBM Buys Red Hat for $34 Billion?
- Roaming Profile files remain locked after logout/shutdown