Chris Conner
2011-Oct-06 00:24 UTC
[R] Issue with read.csv treatment of numerics enclosed in quotes (and a confession)
Dear Help-Rs, I've been dealing with this problem for some time, using a work-around to deal with it. It's time for me to come clean with my ineptitude and seek a what has got to be a more streamlined solution from the Help-Rverse. I regularly import delimited text data that contains numerics enclosed in quotes (e.g., "00765288071"). Thing is, for some of these data, I need to keep the values as "character" class within the data frame (that is to say the leading zeros are important and I would like them to stay). Here is an example of the code I would use to read an example dataset in question: mydata <- read.csv("~/mydata.csv", quote = "\"'") The problem is, when R reads the data and converts them into a data frame, inevitably, R ignores the quotes around values like the above, and reads them in as "numeric". So R strips the valuable leading zeros and converts my "00765288071" to 765288071. I've developed a work-arounds to this involving the use of the following:> whatIneed <- "00000000000" > whatIgot <- 765288071 > whatIgot <- as.character(whatIgot) > substr(whatIneed, 1+nchar(whatIneed)-nchar(whatIgot), nchar(whatIneed)) <- whatIgot > whatIneed[1] "00765288071" My question is, am I missing something in how I'm writing my read.csv statement that would indicate to R that numerics enclosed in quotes should be read and imported as characters and not converted to numerics??? [[alternative HTML version deleted]]
Sarah Goslee
2011-Oct-06 00:41 UTC
[R] Issue with read.csv treatment of numerics enclosed in quotes (and a confession)
Hi Chris, Yes, you're missing something: the colClasses argument to read.csv. Given a tiny little csv file that looks like this: 1,2,3,"01234" 4,5,6,"00011" 7,8,0,"00000"> testdata <- read.csv("testdata.csv", header=FALSE, colClasses=c(NA, NA, NA, "character")) > testdataV1 V2 V3 V4 1 1 2 3 01234 2 4 5 6 00011 3 7 8 0 00000> str(testdata)'data.frame': 3 obs. of 4 variables: $ V1: int 1 4 7 $ V2: int 2 5 8 $ V3: int 3 6 0 $ V4: chr "01234" "00011" "00000" That should do what you want. Not that you should need it, but sprintf() is a neater way to pad out numeric to character values:> sprintf("%05d", 12)[1] "00012"> sprintf("%05d", 1223)[1] "01223"> sprintf("%07d", 12)[1] "0000012">Hope that solves your problem, Sarah On Wed, Oct 5, 2011 at 8:24 PM, Chris Conner <connerpharmd at yahoo.com> wrote:> Dear Help-Rs, > > I've been dealing with this problem for some?time, using a work-around to deal with it. It's time for me to come clean with my ineptitude and seek a what has got to be a more streamlined solution from the Help-Rverse. > > I regularly import delimited text data that contains numerics enclosed in quotes (e.g., "00765288071").? Thing is, for some of these data, I need to keep the values as "character" class within the data frame (that is to say the leading zeros are important and I would like them to stay).? Here is an example of the code I would use to read an example dataset in question: > > mydata <- read.csv("~/mydata.csv", quote = "\"'") > > The problem is, when R reads the data and converts them into a data frame, inevitably, R ignores the quotes around values like the above, and reads them in as "numeric".? So R strips the valuable leading zeros and converts my "00765288071" to 765288071.? I've developed a work-arounds to this involving the use of the following: > >> whatIneed <- "00000000000" >> whatIgot <- 765288071 >> whatIgot <- as.character(whatIgot) >> substr(whatIneed, 1+nchar(whatIneed)-nchar(whatIgot), nchar(whatIneed)) <- whatIgot >> whatIneed > [1] "00765288071" > > My question is, am I missing something in how I'm writing my read.csv statement that would indicate to R that numerics enclosed in quotes should be read and imported as characters and not converted to numerics??? >-- Sarah Goslee http://www.functionaldiversity.org
Apparently Analagous Threads
- confession
- [PATCH 2/3] Fix ERROR: Macros with complex values should be enclosed in parentheses
- [PATCH 2/3] Fix ERROR: Macros with complex values should be enclosed in parentheses
- how to list variables enclosed in an environment
- [Bug 778] sftp client globs entire path, directories enclosed in square brackets are unusable