Martin Ballaschk
2008-Aug-22 14:19 UTC
[R] How to read malformed csv files with read.table?
Hi, how do I read files that have two header fields less than they have columns? The easiest solution would be to insert one or two additional header fields, but I have a lot of files and that would be quite a lot of awful work. Any ideas on how to solve that problem? ####### R stuff: > read.table("myfile.CSV", sep = "\t", header = T) Error in read.table("myfile.CSV", sep = "\t", : more columns than column names > count.fields("myfile.CSV", sep = "\t") [1] 10 12 12 12 12 12 12 12 12 12 12 [...] ####### ugly sample ("Exported by SDL DataTable component"): time/ms C550.KMS Cyt_b559.KMS Cyt_b563.KMS Cyt_f_.KMS P515FR.KMS Scatt.KMS Zea2.KMS PC P700 0 Point1 -599.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 Point2 -598.000 -0.012 -0.013 0.040 0.013 0.027 0.010 0.022 0.000 0.000 0 Point3 -596.500 -0.015 -0.015 0.044 0.020 0.025 0.010 0.033 0.000 0.000 [...] Cheers, Martin
Try this. It will read the file and see if there is a difference and add in the extra headers: x <- " time/ms C550.KMS Cyt_b559.KMS Cyt_b563.KMS Cyt_f_.KMS P515FR.KMS Scatt.KMS Zea2.KMS PC P700 0 Point1 -599.500 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 Point2 -598.000 -0.012 -0.013 0.040 0.013 0.027 0.010 0.022 0.000 0.000 0 Point3 -596.500 -0.015 -0.015 0.044 0.020 0.025 0.010 0.033 0.000 0.000" # find out how many dummy headers you have to add x.c <- count.fields(textConnection(x)) x.diff <- x.c[2] - x.c[1] # assume first line is short x.connection <- textConnection(x) # setup connection if (x.diff > 0){ # read first line x.first <- readLines(x.connection, n=1) # add dummy headers x.first <- paste(x.first, paste(LETTERS[1:x.diff], collapse=" ")) pushBack(x.first, x.connection) # push back the line so it is ready for read.table } input <- read.table(x.connection, header=TRUE) closeAllConnections() On Fri, Aug 22, 2008 at 10:19 AM, Martin Ballaschk <tmp082008 at ballaschk.com> wrote:> Hi, > > how do I read files that have two header fields less than they have columns? > The easiest solution would be to insert one or two additional header fields, > but I have a lot of files and that would be quite a lot of awful work. > > Any ideas on how to solve that problem? > > ####### > R stuff: > >> read.table("myfile.CSV", sep = "\t", header = T) > Error in read.table("myfile.CSV", sep = "\t", : > more columns than column names > >> count.fields("myfile.CSV", sep = "\t") > [1] 10 12 12 12 12 12 12 12 12 12 12 [...] > > ####### > ugly sample ("Exported by SDL DataTable component"): > > time/ms C550.KMS Cyt_b559.KMS Cyt_b563.KMS Cyt_f_.KMS > P515FR.KMS Scatt.KMS Zea2.KMS PC P700 > 0 Point1 -599.500 0.000 0.000 0.000 > 0.000 0.000 0.000 0.000 > 0.000 0.000 > 0 Point2 -598.000 -0.012 -0.013 0.040 > 0.013 0.027 0.010 0.022 > 0.000 0.000 > 0 Point3 -596.500 -0.015 -0.015 0.044 > 0.020 0.025 0.010 0.033 > 0.000 0.000 > [...] > > > Cheers, > Martin > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Daniel Folkinshteyn
2008-Aug-22 15:04 UTC
[R] How to read malformed csv files with read.table?
on 08/22/2008 10:19 AM Martin Ballaschk said the following:> how do I read files that have two header fields less than they have > columns? The easiest solution would be to insert one or two additional > header fields, but I have a lot of files and that would be quite a lot > of awful work. > > Any ideas on how to solve that problem? >you could use read.table with "header = F", that way it will read the table without worrying about column names (they will end up in the first row of the data). Then, you can just delete the first row, or assign it to names(), or whatever. if all the columns in all your files have the same names, you can read them all with header=F and col.names=vectorofcolumnnames, and then delete first row (which will contain the incomplete col names from the file). hope this helps :)
Prof Brian Ripley
2008-Aug-22 15:14 UTC
[R] How to read malformed csv files with read.table?
On Fri, 22 Aug 2008, Daniel Folkinshteyn wrote:> on 08/22/2008 10:19 AM Martin Ballaschk said the following: >> how do I read files that have two header fields less than they have >> columns? The easiest solution would be to insert one or two additional >> header fields, but I have a lot of files and that would be quite a lot of >> awful work. >> >> Any ideas on how to solve that problem? >> > > you could use read.table with "header = F", that way it will read the table > without worrying about column names (they will end up in the first row of the > data).Or, better, use header=FALSE, skip=1 and the col.names arg of read.table().> > Then, you can just delete the first row, or assign it to names(), or > whatever. > > if all the columns in all your files have the same names, you can read them > all with header=F and col.names=vectorofcolumnnames, and then delete first > row (which will contain the incomplete col names from the file).The trouble with that approach is that your will get all factor columns.> hope this helps :) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Martin Ballaschk
2008-Aug-22 17:18 UTC
[R] How to read malformed csv files with read.table?
Hi folks, thank you for your friendly and immediate help! Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley:> Or, better, use header=FALSE, skip=1 and the col.names arg of > read.table().My solution is reading the files without the headers (skip = 1) and seperately reading the headers with scan (scan("myfile.CSV", what = "character", sep = "\t", nlines = 1). After throwing out the first two columns it should be possible to assign the scanned colnames to the data.frame colnames. Cheers Martin
Prof Brian Ripley
2008-Aug-22 17:22 UTC
[R] How to read malformed csv files with read.table?
On Fri, 22 Aug 2008, Martin Ballaschk wrote:> Hi folks, > > thank you for your friendly and immediate help! > > Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley: >> Or, better, use header=FALSE, skip=1 and the col.names arg of read.table(). > > My solution is reading the files without the headers (skip = 1) and > seperately reading the headers with scan (scan("myfile.CSV", what = > "character", sep = "\t", nlines = 1). After throwing out the first two > columns it should be possible to assign the scanned colnames to the > data.frame colnames.Yes, but if you read the header first you can set the col.names via the arg to read.table(). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Martin Ballaschk
2008-Aug-22 17:59 UTC
[R] How to read malformed csv files with read.table?
Hi Brian, Am 22.08.2008 um 19:22 schrieb Prof Brian Ripley:> On Fri, 22 Aug 2008, Martin Ballaschk wrote: >> My solution is reading the files without the headers (skip = 1) and >> seperately reading the headers with scan (scan("myfile.CSV", what = >> "character", sep = "\t", nlines = 1). After throwing out the first >> two columns it should be possible to assign the scanned colnames to >> the data.frame colnames. > > Yes, but if you read the header first you can set the col.names via > the arg to read.table().Thanks! I plan to do it like that (actually it will be stuffed into a loop to read a bunch of files), seems to work: > headernames <- scan("test.CSV", what = "character", sep = "\t", nlines = 1, skip = 4) > my.table <- read.table("test.CSV", header=F, skip = 5, col.names = c("crap.1", "crap.2", headernames)) > head(my.table) crap.1 crap.2 time.ms C550.KMS Cyt_b559.KMS [...] etc. 1 0 Point1 -599.5 0.000 0.000 2 0 Point2 -598.0 0.019 -0.014 3 0 Point3 -596.5 0.025 -0.023 4 0 Point4 -595.0 0.034 -0.029 5 0 Point5 -593.5 0.049 -0.033 6 0 Point6 -592.0 0.068 -0.033 Cheers Martin