Dusterhus, Andre
2014-Feb-06 15:26 UTC
[R] Missval in netCDF and its handling by the ncdf package
Hi all, I got currently some trouble with missvals in netCDF files: I have a netCDF file, written with an unknown program, which shows via ncdump no fillval or missval for its variables. Within the variables obviously a missval of 9.96921e+36 is used. When I read it into R via the ncdf package it shows me a missval of 1e+30, but the 9.96921e+36 are still in the dataset. Since my application is an automised processing of such data I have the following questions: Is R showing a missval, even when it is not existent in the file itself (I know that it uses the 1e+30 value as a default itself, when it writes files)? Could it be the case that R (ncdf) varies here from the "netCDF standard", which assumes 9.96921e+36 instead of 1e+30 as the standard missval (at least a lot of software on the market does this)? How is it possible via R (ncdf) to find out, whether the actual given missval is really given in the data? Many thanks in advance, Andr? D?sterhus This message (and any attachments) is for the recipient ...{{dropped:6}}
David W. Pierce
2014-Feb-06 17:45 UTC
[R] Missval in netCDF and its handling by the ncdf package
> Various questions about missing values in the R ncdf package, and howthey are> handled if the file lacks the standard "_FillValue" attribute.Hi Andre, It sounds like the fundamental problem is that your data files are using a missval, but that fact is not recorded in the file's metadata as an attribute of the variable (although it should be). The "best" solution is to fix the problematic data files by putting in the correct missval. You can do this with R using the ncdf package. For example, to add this attribute to a file, you could do something like this (untested code, just writing off the top of my head): varname = 'Temperature' # for example new_missval = 9.96921e+36 ncid = open.ncdf( 'filename.nc', write=TRUE ) att.put.ncdf( ncid, varname, '_FillValue', new_missval ) close.ncdf( ncid ) Note that the more commonly accepted attribute name nowadays for a missing value is "_FillValue", not "missing_value". You can also do this outside of R by using the netcdf operators (nco), http://nco.sourceforge.net/. If fixing the files to be correct is not practical, you can always fix the data after you read it in: file_missval = 9.96921e+36 mvtol = abs(file_missval)*1.e-5 varname = 'Temperature' # for example ncid = open.ncdf( 'filename.nc' ) data = get.var.ncdf( ncid, varname ) is_missing = (abs( data - file_missval) < mvtol ) # floating comparison data[ is_missing ] = NA To use R to see if the missing_value of _FillValue attribute actually exists in the file: varname = 'Temperature' # for example ncid = open.ncdf( 'filename.nc' ) mv = att.get.ncdf( ncid, varname, '_FillValue" ) print(mv$hasatt) This will be TRUE if the variable has the attribute, and FALSE otherwise. Of course, you can repeat this for an attribute named "missing_value" instead of "_FillValue". See also the R documentation page for ncdf routine "set.missval.ncdf", which allows you to change the actual missing values that are written in the data file. Regards, --Dave On Thu, Feb 6, 2014 at 7:26 AM, Dusterhus, Andre <andhus@noc.ac.uk> wrote:> Hi all, > > I got currently some trouble with missvals in netCDF files: > > I have a netCDF file, written with an unknown program, which shows via > ncdump no fillval or missval for its variables. Within the variables > obviously a missval of 9.96921e+36 is used. > > When I read it into R via the ncdf package it shows me a missval of 1e+30, > but the 9.96921e+36 are still in the dataset. > > Since my application is an automised processing of such data I have the > following questions: > > Is R showing a missval, even when it is not existent in the file itself (I > know that it uses the 1e+30 value as a default itself, when it writes > files)? > Could it be the case that R (ncdf) varies here from the "netCDF standard", > which assumes 9.96921e+36 instead of 1e+30 as the standard missval (at > least a lot of software on the market does this)? > How is it possible via R (ncdf) to find out, whether the actual given > missval is really given in the data? > > Many thanks in advance, > André Düsterhus > > This message (and any attachments) is for the recipien...{{dropped:21}}
Dusterhus, Andre
2014-Feb-07 09:25 UTC
[R] Missval in netCDF and its handling by the ncdf package
Hi David, thanks for this extensive answer. My problem is not really to get the right missvals into a file. I have some trouble with the fact (?) that obviously a missval is given within the ncdf object, when there is no such thing in the original file. As a consequence, ncdf do not make a plain reading of the file, but an interpretation by setting a default value. And under certain conditions it might be a difference for a user, whether a wrong or no missval (fillvalue) is set. Might it be possible to add this information to the documentation of open.ncdf? Regards, Andr? On 6 Feb 2014, at 17:45, David W. Pierce <dpierce at ucsd.edu> wrote:> > Various questions about missing values in the R ncdf package, and how they are > > handled if the file lacks the standard "_FillValue" attribute. > > Hi Andre, > > It sounds like the fundamental problem is that your data files are using a missval, but that fact is not recorded in the file's metadata as an attribute of the variable (although it should be). The "best" solution is to fix the problematic data files by putting in the correct missval. You can do this with R using the ncdf package. For example, to add this attribute to a file, you could do something like this (untested code, just writing off the top of my head): > > varname = 'Temperature' # for example > new_missval = 9.96921e+36 > ncid = open.ncdf( 'filename.nc', write=TRUE ) > att.put.ncdf( ncid, varname, '_FillValue', new_missval ) > close.ncdf( ncid ) > > Note that the more commonly accepted attribute name nowadays for a missing value is "_FillValue", not "missing_value". > > You can also do this outside of R by using the netcdf operators (nco), http://nco.sourceforge.net/. > > If fixing the files to be correct is not practical, you can always fix the data after you read it in: > > file_missval = 9.96921e+36 > mvtol = abs(file_missval)*1.e-5 > varname = 'Temperature' # for example > ncid = open.ncdf( 'filename.nc' ) > data = get.var.ncdf( ncid, varname ) > is_missing = (abs( data - file_missval) < mvtol ) # floating comparison > data[ is_missing ] = NA > > To use R to see if the missing_value of _FillValue attribute actually exists in the file: > > varname = 'Temperature' # for example > ncid = open.ncdf( 'filename.nc' ) > mv = att.get.ncdf( ncid, varname, '_FillValue" ) > print(mv$hasatt) > > This will be TRUE if the variable has the attribute, and FALSE otherwise. Of course, you can repeat this for an attribute named "missing_value" instead of "_FillValue". > > See also the R documentation page for ncdf routine "set.missval.ncdf", which allows you to change the actual missing values that are written in the data file. > > Regards, > > --Dave > > > > On Thu, Feb 6, 2014 at 7:26 AM, Dusterhus, Andre <andhus at noc.ac.uk> wrote: > Hi all, > > I got currently some trouble with missvals in netCDF files: > > I have a netCDF file, written with an unknown program, which shows via ncdump no fillval or missval for its variables. Within the variables obviously a missval of 9.96921e+36 is used. > > When I read it into R via the ncdf package it shows me a missval of 1e+30, but the 9.96921e+36 are still in the dataset. > > Since my application is an automised processing of such data I have the following questions: > > Is R showing a missval, even when it is not existent in the file itself (I know that it uses the 1e+30 value as a default itself, when it writes files)? > Could it be the case that R (ncdf) varies here from the "netCDF standard", which assumes 9.96921e+36 instead of 1e+30 as the standard missval (at least a lot of software on the market does this)? > How is it possible via R (ncdf) to find out, whether the actual given missval is really given in the data? > > Many thanks in advance, > Andr? D?sterhus > > This message (and any attachments) is for the recipient ...{{dropped:6}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > -- > David W. Pierce > Division of Climate, Atmospheric Science, and Physical Oceanography > Scripps Institution of Oceanography, La Jolla, California, USA > (858) 534-8276 (voice) / (858) 534-8561 (fax) dpierce at ucsd.eduThis message (and any attachments) is for the recipient only. NERC is subject to the Freedom of Information Act 2000 and the contents of this email and any reply you make may be disclosed by NERC unless it is exempt from release under the Act. Any material supplied to NERC may be stored in an electronic records management system.