Bryan Hanson
2007-Oct-30 19:40 UTC
[R] Reading a file with read.csv: two character rows not interpreted as I hope
Hi Folks... Œbeen playing with this for a while, with no luck, so I¹m hoping someone knows it off the top of their head... Difficult to find this nuance in the archives, as so many msgs deal with read.csv! I¹m trying to read a data file with the following structure (a little piece of the actual data, they are actually csv just didn¹t paste with the commas): wavelength SampleA SampleB SampleC SampleD color "green" "black" "black" "green" class "Class 1" "Class 2" "Class 2" "Class 1" 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01 Columns after the first one are sample names. 2nd row is the list of colors to use in later plotting. 3rd row is the class for later manova. The rest of it is x data in the first column with y1, y2...following for plotting. I can read the file w/o the color or class rows with read.csv just fine, makes a nice data frame with proper data types. The problem comes when parsing the 2nd and 3rd rows. Here¹s the code: data = read.csv("filename", header=TRUE) # read in data color = data[1,]; color = data[-1] # capture color info & throw out 1st value class = data[2,]; class = class[-1] # capture category info & throw out 1st value cleaned.data = data[-1,] # remove color & category info for matrix operations cleaned.data = data[-1,] freq = data[,1] # capture frequency info What happens is that freq is parsed as factors, and the color and class are parsed as a data frames of factors. I need color and class to be characters which I can pass to functions in the typical way one uses colors and levels. I need the freq & the cleaned.data info as numeric for plotting. I don¹t feel I¹m far off from things working, but that¹s where you all come in! Seems like an argument of as.something is needed, but the ones I¹ve tried don¹t work. Would it help to put color and class above the x,y data in the file, then clean it off? Btw, I¹m on a Mac using R 2.6.0. Thanks in advance, Bryan ************* Bryan Hanson Professor of Chemistry & Biochemistry [[alternative HTML version deleted]]
jim holtman
2007-Oct-31 00:40 UTC
[R] Reading a file with read.csv: two character rows not interpreted as I hope
Here is one way. You will probably use 'file' instead of textConnection> x.in <- textConnection('wavelength SampleA SampleB SampleC SampleD+ color "green" "black" "black" "green" + class "Class 1" "Class 2" "Class 2" "Class 1" + 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 + 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 + 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 + 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 + 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 + 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01')> > c.names <- scan(x.in, what='', nlines=1) # read column namesRead 5 items> c.options <- read.table(x.in, as.is=TRUE, nrows=2) # get lines 2-3 > c.data <- read.table(x.in) # rest of the data > colnames(c.data) <- c.names > close(x.in) > c.options # here are lines 2-3V1 V2 V3 V4 V5 1 color green black black green 2 class Class 1 Class 2 Class 2 Class 1> c.data # your datawavelength SampleA SampleB SampleC SampleD 1 403 0.194 0.214 0.211 0.183 2 409 0.192 0.189 0.200 0.182 3 415 0.170 0.199 0.194 0.186 4 420 0.159 0.191 0.216 0.174 5 426 0.150 0.166 0.172 0.158 6 432 0.142 0.150 0.162 0.148 On 10/30/07, Bryan Hanson <hanson at depauw.edu> wrote:> Hi Folks... ?been playing with this for a while, with no luck, so I?m hoping > someone knows it off the top of their head... Difficult to find this nuance > in the archives, as so many msgs deal with read.csv! > > I?m trying to read a data file with the following structure (a little piece > of the actual data, they are actually csv just didn?t paste with the > commas): > > wavelength SampleA SampleB SampleC SampleD > color "green" "black" "black" "green" > class "Class 1" "Class 2" "Class 2" "Class 1" > 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01 > 409 1.92E-01 1.89E-01 2.00E-01 1.82E-01 > 415 1.70E-01 1.99E-01 1.94E-01 1.86E-01 > 420 1.59E-01 1.91E-01 2.16E-01 1.74E-01 > 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01 > 432 1.42E-01 1.50E-01 1.62E-01 1.48E-01 > > Columns after the first one are sample names. 2nd row is the list of colors > to use in later plotting. 3rd row is the class for later manova. The rest > of it is x data in the first column with y1, y2...following for plotting. > > I can read the file w/o the color or class rows with read.csv just fine, > makes a nice data frame with proper data types. The problem comes when > parsing the 2nd and 3rd rows. Here?s the code: > > data = read.csv("filename", header=TRUE) # read in data > color = data[1,]; color = data[-1] # capture color info & throw out 1st > value > class = data[2,]; class = class[-1] # capture category info & throw out 1st > value > > cleaned.data = data[-1,] # remove color & category info for matrix > operations > cleaned.data = data[-1,] > freq = data[,1] # capture frequency info > > What happens is that freq is parsed as factors, and the color and class are > parsed as a data frames of factors. > I need color and class to be characters which I can pass to functions in the > typical way one uses colors and levels. > I need the freq & the cleaned.data info as numeric for plotting. > > I don?t feel I?m far off from things working, but that?s where you all come > in! Seems like an argument of as.something is needed, but the ones I?ve > tried don?t work. Would it help to put color and class above the x,y data > in the file, then clean it off? > > Btw, I?m on a Mac using R 2.6.0. > > Thanks in advance, Bryan > ************* > Bryan Hanson > Professor of Chemistry & Biochemistry > > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?