Full_Name: J. R. M. Hosking Version: 1.9.0 OS: Windows 2000 Submission from: (NULL) (129.34.20.23) Two problems, perhaps related: (1) na.strings is not honored when x is non-numeric and as.is=T > type.convert( c("abc","-"), as.is=T, na.strings="-" ) [1] "abc" "-" ... unless x consists only of NAs > type.convert( c("abc","-"), as.is=T, na.strings=c("-","abc") ) [1] NA NA But with x numeric or as.is FALSE (or omitted), it works as advertised: > type.convert( c("abc","-"), na.strings="-" ) [1] abc <NA> Levels: abc > type.convert( c("6","-"), na.strings="-" ) [1] 6 NA (2) When na.strings is omitted, blank strings in nonnumeric vectors are not converted into NAs (regardless of the value of as.is). > type.convert(c("6",""," ")) # OK: gives 6 NA NA [1] 6 NA NA > type.convert(c("A",""," ")) # gives a factor with 3 levels and no NAs [1] A Levels: A > type.convert(c("A",""," "),as.is=T) # gives a char vector with no NAs [1] "A" "" " " Rider: it would be nice if type.convert had a strip.white argument, so that type.convert(c(" 6"," -"),na.strings="-",strip.white=T) would return a numeric vector. Stripping leading and trailing blanks can be time-consuming, and could presumably be done more quickly by an .Internal function such as the one called by type.convert. (R 1.9.0, Windows binary from CRAN)
On Fri, 16 Apr 2004 hosking@watson.ibm.com wrote: [...]> Rider: it would be nice if type.convert had a strip.white argument, so that > type.convert(c(" 6"," -"),na.strings="-",strip.white=T) > would return a numeric vector. Stripping leading and trailing blanks can be > time-consuming, and could presumably be done more quickly by an .Internal > function such as the one called by type.convert.It can be done very rapidly by sub(), for example. Given that type.convert is documented as a helper function for read.table, we will not be encumbering it with features read.table does not need. It would be helpful if you did not encumber bug reports with other issues, too. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Fri, 16 Apr 2004 hosking@watson.ibm.com wrote:> Full_Name: J. R. M. Hosking > Version: 1.9.0 > OS: Windows 2000 > Submission from: (NULL) (129.34.20.23) > > > Two problems, perhaps related: > > (1) na.strings is not honored when x is non-numeric and as.is=T > > > type.convert( c("abc","-"), as.is=T, na.strings="-" ) > [1] "abc" "-" > > ... unless x consists only of NAs > > > type.convert( c("abc","-"), as.is=T, na.strings=c("-","abc") ) > [1] NA NAThat is documented to be different (a logical vector).> But with x numeric or as.is FALSE (or omitted), it works as advertised: > > > type.convert( c("abc","-"), na.strings="-" ) > [1] abc <NA> > Levels: abc > > type.convert( c("6","-"), na.strings="-" ) > [1] 6 NAThe point is that if no conversion takes place, no checking for na.strings took place. That _was_ intentional, as it is not needed by read.table. I've added it.> (2) When na.strings is omitted, blank strings in nonnumeric vectors are not > converted into NAs (regardless of the value of as.is).This is wholly intentional, and is now documented. It is nothing to do with whether `na.strings is omitted', though.> > type.convert(c("6",""," ")) # OK: gives 6 NA NA > [1] 6 NA NA > > > type.convert(c("A",""," ")) # gives a factor with 3 levels and no NAs > [1] A > Levels: A > > > type.convert(c("A",""," "),as.is=T) # gives a char vector with no NAs > [1] "A" "" " "-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595