Bos, Roger
2014-Apr-30 20:20 UTC
[R] R 3.1 changes to type.convert causing strings where I used to get numeric
Dear R-help, I recently upgraded to R 3.1 patched and code that ran fine previously and now giving a lot of errors because the data is coming in as strings instead of numeric. I can fix my code to wrapping each item I want to use with as.numeric(), but that seems very inefficient. I looked at the change list for R 3.1 and I see the first item is a change in type.convert() that seems to be causing me grief. The suggestion is to use colClasses, but when I try to do so I get an error regarding the quotes again...> ann1 <- read.table(driveletter %+% "/snap/ann/snap_fyr_ann1_" %+% i %+% ".txt", header=TRUE, quote="", as.is=TRUE, colClasses='numeric')Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '"1033.7"' Does anyone have any suggestions? CHANGES IN R 3.1.0: NEW FEATURES type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs. If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be"numeric". *************************************************************** This message is for the named person's use only. It may contain confidential, proprietary or legally privileged information. No right to confidential or privileged treatment of this message is waived or lost by an error in transmission. If you have received this message in error, please immediately notify the sender by e-mail, delete the message and all copies from your system and destroy any hard copies. You must not, directly or indirectly, use, disclose, distribute, print or copy any part of this message if you are not the intended recipient.
Duncan Murdoch
2014-Apr-30 20:40 UTC
[R] R 3.1 changes to type.convert causing strings where I used to get numeric
On 30/04/2014, 4:20 PM, Bos, Roger wrote:> Dear R-help, > > I recently upgraded to R 3.1 patched and code that ran fine previously and now giving a lot of errors because the data is coming in as strings instead of numeric. I can fix my code to wrapping each item I want to use with as.numeric(), but that seems very inefficient. > > I looked at the change list for R 3.1 and I see the first item is a change in type.convert() that seems to be causing me grief. The suggestion is to use colClasses, but when I try to do so I get an error regarding the quotes again... > >> ann1 <- read.table(driveletter %+% "/snap/ann/snap_fyr_ann1_" %+% i %+% ".txt", header=TRUE, quote="", as.is=TRUE, colClasses='numeric') > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > scan() expected 'a real', got '"1033.7"' > > Does anyone have any suggestions?This sounds like two separate problems. The first is the change to type.convert(). That's going to be dealt with (it's already in R-devel, will eventually be handled in R-patched and 3.1.1). You deserve part of the blame for this for never testing the pre-release version. It would have been easier to fix before release, but nobody who tested then bothered to report it. The second problem is one I don't recall hearing reported before. If you have a .csv file containing the lines X "1" then it should be readable as a .csv, because the quotes should be stripped. In fact it is readable now if you *don't* specify that the column is numeric, and it will be converted to a numeric value. However, if you do use colClasses="numeric" you'll get an error. That looks wrong, though it is consistent with ?read.table. > x <- c("X", '"1"') > read.csv(textConnection(x)) X 1 1 > read.csv(textConnection(x), colClasses="numeric") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '"1"' > Duncan Murdoch> > > CHANGES IN R 3.1.0: > NEW FEATURES > > > type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs. > If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be"numeric". > > > *************************************************************** > This message is for the named person's use only. It may > contain confidential, proprietary or legally privileged > information. No right to confidential or privileged treatment > of this message is waived or lost by an error in transmission. > If you have received this message in error, please immediately > notify the sender by e-mail, delete the message and all > copies from your system and destroy any hard copies. You must > not, directly or indirectly, use, disclose, distribute, > print or copy any part of this message if you are not > the intended recipient. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2014-Apr-30 21:20 UTC
[R] R 3.1 changes to type.convert causing strings where I used to get numeric
I though that change in NEWS re: type.convert referenced the fact that numbers which were longer than could be represented accurately within the constraints of class numeric were now being read in as characters which would give you the option to later convert with one of the bignum packages. Your error seems to relate to decimal values being enclosed within quotes not being read in with coercion. That's a different problem, I believe.> read.table(text="'a' '1234'", colClasses=c('character', 'numeric'))Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got ''1234'' So was able to reproduce your problem fairly easily, and solve it by defining a class and a coercion method to numeric and then using as an arcument to colClasses.> setClass('mycharnum', representation=list('character') ) > ?setClass > ?as > setAs('character', 'mycharnum' , function(from) as.numeric( from) ) > > read.table(text="'a' '1234'", colClasses=c('character', 'mycharnum'))V1 V2 1 a 1234> str( read.table(text="'a' '1234'", colClasses=c('character', 'mycharnum')) )'data.frame': 1 obs. of 2 variables: $ V1: chr "a" $ V2: num 1234 I'm not really sure that the representation argument is meaningful in the setClass call. I tried with bot character and numeric and go t the same output on this limited testing method. -- David. On Apr 30, 2014, at 1:20 PM, Bos, Roger wrote:> Dear R-help, > > I recently upgraded to R 3.1 patched and code that ran fine previously and now giving a lot of errors because the data is coming in as strings instead of numeric. I can fix my code to wrapping each item I want to use with as.numeric(), but that seems very inefficient. > > I looked at the change list for R 3.1 and I see the first item is a change in type.convert() that seems to be causing me grief. The suggestion is to use colClasses, but when I try to do so I get an error regarding the quotes again... > >> ann1 <- read.table(driveletter %+% "/snap/ann/snap_fyr_ann1_" %+% i %+% ".txt", header=TRUE, quote="", as.is=TRUE, colClasses='numeric') > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > scan() expected 'a real', got '"1033.7"' > > Does anyone have any suggestions? > > > CHANGES IN R 3.1.0: > NEW FEATURES > > > type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs. > If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be"numeric". > > > *************************************************************** > This message is for the named person's use only. It ma...{{dropped:20}}
David Winsemius
2014-Apr-30 21:20 UTC
[R] R 3.1 changes to type.convert causing strings where I used to get numeric
I though that change in NEWS re: type.convert referenced the fact that numbers which were longer than could be represented accurately within the constraints of class numeric were now being read in as characters which would give you the option to later convert with one of the bignum packages. Your error seems to relate to decimal values being enclosed within quotes not being read in with coercion. That's a different problem, I believe.> read.table(text="'a' '1234'", colClasses=c('character', 'numeric'))Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got ''1234'' So was able to reproduce your problem fairly easily, and solve it by defining a class and a coercion method to numeric and then using as an arcument to colClasses.> setClass('mycharnum', representation=list('character') ) > ?setClass > ?as > setAs('character', 'mycharnum' , function(from) as.numeric( from) ) > > read.table(text="'a' '1234'", colClasses=c('character', 'mycharnum'))V1 V2 1 a 1234> str( read.table(text="'a' '1234'", colClasses=c('character', 'mycharnum')) )'data.frame': 1 obs. of 2 variables: $ V1: chr "a" $ V2: num 1234 I'm not really sure that the representation argument is meaningful in the setClass call. I tried with bot character and numeric and go t the same output on this limited testing method. -- David. On Apr 30, 2014, at 1:20 PM, Bos, Roger wrote:> Dear R-help, > > I recently upgraded to R 3.1 patched and code that ran fine previously and now giving a lot of errors because the data is coming in as strings instead of numeric. I can fix my code to wrapping each item I want to use with as.numeric(), but that seems very inefficient. > > I looked at the change list for R 3.1 and I see the first item is a change in type.convert() that seems to be causing me grief. The suggestion is to use colClasses, but when I try to do so I get an error regarding the quotes again... > >> ann1 <- read.table(driveletter %+% "/snap/ann/snap_fyr_ann1_" %+% i %+% ".txt", header=TRUE, quote="", as.is=TRUE, colClasses='numeric') > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > scan() expected 'a real', got '"1033.7"' > > Does anyone have any suggestions? > > > CHANGES IN R 3.1.0: > NEW FEATURES > > > type.convert() (and hence by default read.table()) returns a character vector or factor when representing a numeric input as a double would lose accuracy. Similarly for complex inputs. > If a file contains numeric data with unrepresentable numbers of decimal places that are intended to be read as numeric, specify colClasses in read.table() to be"numeric". > > > *************************************************************** > This message is for the named person's use only. It ma...{{dropped:20}}