Jason Rupert
2009-Oct-29 19:26 UTC
[R] Recommendation for dealing with mixed input types in CSV
Currently I have a CSV with mixed input types that I am trying to read in and reformat without having to list off all the column names.? Below is an example of the data: HouseColor, HouseSize, HouseCost Blue, 1600, 160e3 Blue, 1600, 160e3 Actually I have about 60 columns like this, so imagine the above repeated about 30 times column-wise.? Luckily the ones in scientific notation are grouped together, i.e. columns 11-56. Using read.csv or as.numeric, is there?a way to convert all those in?scientific format over to?general numeric syntax?? Right now I have something like the following input_df<-read.csv(InputFile, skip=0, header=TRUE, strip.white = TRUE) I tried: as.numeric(input_df[, 11:56]) but this returns an error Error: (list) object cannot be coerced to type 'double' Oddly it does appear to work successfully row-wiseas.numeric(input_df[1, 11:56]) as.numeric(input_df[2, 11:56]) etc. However, trying it on multiple rows produces the same error as above: as.numeric(input_df[1:2, 11:56]) After a bit, I became a bit frustrated that this was not working so I tried just deleting the columns: input_df[1, 11:56]<-NULL This also failed, so are there any suggestions about how to convert the values in scientific notation over to standard numeric syntaix? Thank you again again for all your insights and feedback.
David Winsemius
2009-Oct-29 20:02 UTC
[R] Recommendation for dealing with mixed input types in CSV
On Oct 29, 2009, at 3:26 PM, Jason Rupert wrote:> Currently I have a CSV with mixed input types that I am trying to > read in and reformat without having to list off all the column > names. Below is an example of the data: > > HouseColor, HouseSize, HouseCost > Blue, 1600, 160e3 > Blue, 1600, 160e3 > > Actually I have about 60 columns like this, so imagine the above > repeated about 30 times column-wise. > > Luckily the ones in scientific notation are grouped together, i.e. > columns 11-56. > > Using read.csv or as.numeric, is there a way to convert all those in > scientific format over to general numeric syntax?Option 1: do it in the read step. (in my experience the more dificult and error-prone method when you are starting out.) ?read.table see section on colClasses, and define your columns as "character" or "numeric" appropriately. Option 2: Read them in with as.is=TRUE, and stringsAsFactors=FALSE convert them in a loop for (i in 11:56) DFhouses[, i] <- as.numeric( DFhouses[, i] )> > Right now I have something like the following > input_df<-read.csv(InputFile, skip=0, header=TRUE, strip.white = TRUE) > > I tried: > as.numeric(input_df[, 11:56]) > but this returns an error > Error: (list) object cannot be coerced to type 'double' > > Oddly it does appear to work successfully row- > wiseas.numeric(input_df[1, 11:56]) > as.numeric(input_df[2, 11:56]) > etc. > > However, trying it on multiple rows produces the same error as above: > as.numeric(input_df[1:2, 11:56]) > > After a bit, I became a bit frustrated that this was not working so > I tried just deleting the columns: > input_df[1, 11:56]<-NULL > > This also failed, so are there any suggestions about how to convert > the values in scientific notation over to standard numeric syntaix? > > Thank you again again for all your insights and feedback. > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Ben Bolker
2009-Oct-29 20:37 UTC
[R] Re commendation for dealing with mixed input types in CSV
Jason Rupert wrote:> > Currently I have a CSV with mixed input types that I am trying to read in > and reformat without having to list off all the column names.? Below is an > example of the data: > > HouseColor, HouseSize, HouseCost > Blue, 1600, 160e3 > Blue, 1600, 160e3 > > [snip] >I'm a little surprised that read.csv is *not* automatically converting your scientific notation to numeric. When I save the three lines above to the file "house.tmp", I get numeric values ...> read.csv("house.tmp",header=TRUE)HouseColor HouseSize HouseCost 1 Blue 1600 160000 2 Blue 1600 160000 Are you sure there isn't something else funny about those columns? -- View this message in context: http://www.nabble.com/Recommendation-for-dealing-with-mixed-input-types-in-CSV-tp26119243p26120294.html Sent from the R help mailing list archive at Nabble.com.