thr3ads.net - R help - [R] How to read malformed csv files with read.table? [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Martin Ballaschk

2008-Aug-22 14:19 UTC

[R] How to read malformed csv files with read.table?

Hi,

how do I read files that have two header fields less than they have  
columns? The easiest solution would be to insert one or two additional  
header fields, but I have a lot of files and that would be quite a lot  
of awful work.

Any ideas on how to solve that problem?

#######
R stuff:

 > read.table("myfile.CSV", sep = "\t", header = T)
   Error in read.table("myfile.CSV", sep = "\t",  :
   more columns than column names

 > count.fields("myfile.CSV", sep = "\t")
    [1] 10 12 12 12 12 12 12 12 12 12 12 [...]

#######
ugly sample ("Exported by SDL DataTable component"):

	time/ms	C550.KMS	Cyt_b559.KMS	Cyt_b563.KMS	Cyt_f_.KMS	P515FR.KMS	 
Scatt.KMS	Zea2.KMS	PC	P700
0	Point1	    -599.500	       0.000	       0.000	       0.000	        
0.000	       0.000	       0.000	       0.000	       0.000	       0.000
0	Point2	    -598.000	      -0.012	      -0.013	       0.040	        
0.013	       0.027	       0.010	       0.022	       0.000	       0.000
0	Point3	    -596.500	      -0.015	      -0.015	       0.044	        
0.020	       0.025	       0.010	       0.033	       0.000	       0.000
[...]


Cheers,
Martin

jim holtman

2008-Aug-22 15:02 UTC

head link

[R] How to read malformed csv files with read.table?

Try this.  It will read the file and see if there is a difference and
add in the extra headers:

x <- "       time/ms C550.KMS        Cyt_b559.KMS    Cyt_b563.KMS
Cyt_f_.KMS      P515FR.KMS      Scatt.KMS       Zea2.KMS        PC
 P700
0       Point1      -599.500           0.000           0.000
0.000           0.000          0.000           0.000           0.000
        0.000           0.000
0       Point2      -598.000          -0.012          -0.013
0.040           0.013          0.027           0.010           0.022
        0.000           0.000
0       Point3      -596.500          -0.015          -0.015
0.044           0.020          0.025           0.010           0.033
        0.000           0.000"
# find out how many dummy headers you have to add
x.c <- count.fields(textConnection(x))
x.diff <- x.c[2] - x.c[1]  # assume first line is short
x.connection <- textConnection(x)  # setup connection
if (x.diff > 0){
    # read first line
    x.first <- readLines(x.connection, n=1)
    # add dummy headers
    x.first <- paste(x.first, paste(LETTERS[1:x.diff], collapse="
"))
    pushBack(x.first, x.connection)   # push back the line so it is
ready for read.table
}

input <- read.table(x.connection, header=TRUE)
closeAllConnections()



On Fri, Aug 22, 2008 at 10:19 AM, Martin Ballaschk
<tmp082008 at ballaschk.com> wrote:> Hi,
>
> how do I read files that have two header fields less than they have
columns?
> The easiest solution would be to insert one or two additional header
fields,
> but I have a lot of files and that would be quite a lot of awful work.
>
> Any ideas on how to solve that problem?
>
> #######
> R stuff:
>
>> read.table("myfile.CSV", sep = "\t", header = T)
>  Error in read.table("myfile.CSV", sep = "\t",  :
>  more columns than column names
>
>> count.fields("myfile.CSV", sep = "\t")
>   [1] 10 12 12 12 12 12 12 12 12 12 12 [...]
>
> #######
> ugly sample ("Exported by SDL DataTable component"):
>
>        time/ms C550.KMS        Cyt_b559.KMS    Cyt_b563.KMS    Cyt_f_.KMS
>    P515FR.KMS      Scatt.KMS       Zea2.KMS        PC      P700
> 0       Point1      -599.500           0.000           0.000          
0.000
>           0.000          0.000           0.000           0.000
> 0.000           0.000
> 0       Point2      -598.000          -0.012          -0.013          
0.040
>           0.013          0.027           0.010           0.022
> 0.000           0.000
> 0       Point3      -596.500          -0.015          -0.015          
0.044
>           0.020          0.025           0.010           0.033
> 0.000           0.000
> [...]
>
>
> Cheers,
> Martin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Daniel Folkinshteyn

2008-Aug-22 15:04 UTC

head link

[R] How to read malformed csv files with read.table?

on 08/22/2008 10:19 AM Martin Ballaschk said the
following:> how do I read files that have two header fields less than they have 
> columns? The easiest solution would be to insert one or two additional 
> header fields, but I have a lot of files and that would be quite a lot 
> of awful work.
> 
> Any ideas on how to solve that problem?
> 
you could use read.table with "header = F", that way it will read the 
table without worrying about column names (they will end up in the first 
row of the data).

Then, you can just delete the first row, or assign it to names(), or 
whatever.

if all the columns in all your files have the same names, you can read 
them all with header=F and col.names=vectorofcolumnnames, and then 
delete first row (which will contain the incomplete col names from the 
file).

hope this helps :)

Prof Brian Ripley

2008-Aug-22 15:14 UTC

head link

[R] How to read malformed csv files with read.table?

On Fri, 22 Aug 2008, Daniel Folkinshteyn wrote:
> on 08/22/2008 10:19 AM Martin Ballaschk said the following:
>> how do I read files that have two header fields less than they have 
>> columns? The easiest solution would be to insert one or two additional 
>> header fields, but I have a lot of files and that would be quite a lot
of
>> awful work.
>> 
>> Any ideas on how to solve that problem?
>> 
>
> you could use read.table with "header = F", that way it will read
the table
> without worrying about column names (they will end up in the first row of
the
> data).
Or, better, use header=FALSE, skip=1 and the col.names arg of 
read.table().
>
> Then, you can just delete the first row, or assign it to names(), or 
> whatever.
>
> if all the columns in all your files have the same names, you can read them
> all with header=F and col.names=vectorofcolumnnames, and then delete first 
> row (which will contain the incomplete col names from the file).
The trouble with that approach is that your will get all factor columns.
> hope this helps :)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Martin Ballaschk

2008-Aug-22 17:18 UTC

head link

[R] How to read malformed csv files with read.table?

Hi folks,

thank you for your friendly and immediate help!

Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley:> Or, better, use header=FALSE, skip=1 and the col.names arg of  
> read.table().
My solution is reading the files without the headers (skip = 1) and  
seperately reading the headers with scan (scan("myfile.CSV", what =  
"character", sep = "\t", nlines = 1). After throwing out the
first two
columns it should be possible to assign the scanned colnames to the  
data.frame colnames.

Cheers
Martin

Prof Brian Ripley

2008-Aug-22 17:22 UTC

head link

[R] How to read malformed csv files with read.table?

On Fri, 22 Aug 2008, Martin Ballaschk wrote:
> Hi folks,
>
> thank you for your friendly and immediate help!
>
> Am 22.08.2008 um 17:14 schrieb Prof Brian Ripley:
>> Or, better, use header=FALSE, skip=1 and the col.names arg of
read.table().
>
> My solution is reading the files without the headers (skip = 1) and 
> seperately reading the headers with scan (scan("myfile.CSV", what
=
> "character", sep = "\t", nlines = 1). After throwing
out the first two
> columns it should be possible to assign the scanned colnames to the 
> data.frame colnames.
Yes, but if you read the header first you can set the col.names via the 
arg to read.table().

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Martin Ballaschk

2008-Aug-22 17:59 UTC

head link

[R] How to read malformed csv files with read.table?

Hi Brian,


Am 22.08.2008 um 19:22 schrieb Prof Brian Ripley:> On Fri, 22 Aug 2008, Martin Ballaschk wrote:
>> My solution is reading the files without the headers (skip = 1) and  
>> seperately reading the headers with scan (scan("myfile.CSV",
what =
>> "character", sep = "\t", nlines = 1). After
throwing out the first
>> two columns it should be possible to assign the scanned colnames to  
>> the data.frame colnames.
>
> Yes, but if you read the header first you can set the col.names via  
> the arg to read.table().
Thanks! I plan to do it like that (actually it will be stuffed into a  
loop to read a bunch of files), seems to work:

 > headernames <- scan("test.CSV", what = "character",
sep = "\t",
nlines = 1, skip = 4)
 > my.table <- read.table("test.CSV", header=F, skip = 5,
col.names =
c("crap.1", "crap.2", headernames))

 > head(my.table)
   crap.1 crap.2 time.ms C550.KMS Cyt_b559.KMS [...] etc.
1      0 Point1  -599.5    0.000        0.000
2      0 Point2  -598.0    0.019       -0.014
3      0 Point3  -596.5    0.025       -0.023
4      0 Point4  -595.0    0.034       -0.029
5      0 Point5  -593.5    0.049       -0.033
6      0 Point6  -592.0    0.068       -0.033

Cheers
Martin

R help - Aug 2008 - How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?

[R] How to read malformed csv files with read.table?