thr3ads.net - R help - [R] read.csv, header=TRUE but skip the first line [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Mark Knecht

2009-Jun-28 21:05 UTC

[R] read.csv, header=TRUE but skip the first line

Hi,
   Complete newbie to R here. Just getting started reading manuals and
playing with data.

   I've managed to successfully get my *.csv files into R, however I
have to use header=FALSE because the real header starts in line #2.
The file format looks like:

PORTFOLIO EQUITY TABLE

TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
PL,DRAWDOWN,DRAWDOWN(%),EQUITY

1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00"
2,1,1/14/2004 9:03:00 AM,-86.00,As
Given,1,-86.00,86.00,0.849,"10,038.00"
3,1,1/14/2004 11:51:00 AM,-226.00,As
Given,1,-226.00,312.00,3.082,"9,812.00"
4,1,1/15/2004 12:57:00 PM,134.00,As
Given,1,134.00,178.00,1.758,"9,946.00"

where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of
the text is on line 2, and then the lines starting with numbers are
the real data. (Spaces added by me for email clarity only.)

If I remove the first line by hand then I can use header=TRUE and
things work correctly, but it's not practical for me to remove the
first line by hand on all these files every day.

I'd like to understand how I can do the read.csv but skip the first
line. Possibly read the file, delete the first line and then send it
to read.csv, or some other way?

Thanks in advance,
Mark

Peter Dalgaard

2009-Jun-28 21:18 UTC

head link

[R] read.csv, header=TRUE but skip the first line

Mark Knecht wrote:> Hi,
>    Complete newbie to R here. Just getting started reading manuals and
> playing with data.
> 
>    I've managed to successfully get my *.csv files into R, however I
> have to use header=FALSE because the real header starts in line #2.
> The file format looks like:
> 
> PORTFOLIO EQUITY TABLE
> 
> TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
> PL,DRAWDOWN,DRAWDOWN(%),EQUITY
> 
> 1,1,1/8/2004 12:57:00 PM,124.00,As
Given,1,124.00,0.00,0,"10,124.00"
> 2,1,1/14/2004 9:03:00 AM,-86.00,As
Given,1,-86.00,86.00,0.849,"10,038.00"
> 3,1,1/14/2004 11:51:00 AM,-226.00,As
Given,1,-226.00,312.00,3.082,"9,812.00"
> 4,1,1/15/2004 12:57:00 PM,134.00,As
Given,1,134.00,178.00,1.758,"9,946.00"
> 
> where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest
of
> the text is on line 2, and then the lines starting with numbers are
> the real data. (Spaces added by me for email clarity only.)
> 
> If I remove the first line by hand then I can use header=TRUE and
> things work correctly, but it's not practical for me to remove the
> first line by hand on all these files every day.
> 
> I'd like to understand how I can do the read.csv but skip the first
> line. Possibly read the file, delete the first line and then send it
> to read.csv, or some other way?
Does it not work to set skip=1 in read.csv??

(With names like that, you might want check.names=FALSE too)

-- 
    O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Joshua Wiley

2009-Jun-28 21:23 UTC

head link

[R] read.csv, header=TRUE but skip the first line

Hey Mark,

What about something like

read.csv(file="yourfile", skip=1, header=TRUE)

Hope that helps,

Joshua

On Sun, Jun 28, 2009 at 2:05 PM, Mark Knecht<markknecht at gmail.com>
wrote:> Hi,
> ? Complete newbie to R here. Just getting started reading manuals and
> playing with data.
>
> ? I've managed to successfully get my *.csv files into R, however I
> have to use header=FALSE because the real header starts in line #2.
> The file format looks like:
>
> PORTFOLIO EQUITY TABLE
>
> TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
> PL,DRAWDOWN,DRAWDOWN(%),EQUITY
>
> 1,1,1/8/2004 12:57:00 PM,124.00,As
Given,1,124.00,0.00,0,"10,124.00"
> 2,1,1/14/2004 9:03:00 AM,-86.00,As
Given,1,-86.00,86.00,0.849,"10,038.00"
> 3,1,1/14/2004 11:51:00 AM,-226.00,As
Given,1,-226.00,312.00,3.082,"9,812.00"
> 4,1,1/15/2004 12:57:00 PM,134.00,As
Given,1,134.00,178.00,1.758,"9,946.00"
>
> where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest
of
> the text is on line 2, and then the lines starting with numbers are
> the real data. (Spaces added by me for email clarity only.)
>
> If I remove the first line by hand then I can use header=TRUE and
> things work correctly, but it's not practical for me to remove the
> first line by hand on all these files every day.
>
> I'd like to understand how I can do the read.csv but skip the first
> line. Possibly read the file, delete the first line and then send it
> to read.csv, or some other way?
>
> Thanks in advance,
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Junior in Psychology
University of California, Riverside
http://www.joshuawiley.com/

(Ted Harding)

2009-Jun-28 21:38 UTC

head link

[R] read.csv, header=TRUE but skip the first line

On 28-Jun-09 21:05:59, Mark Knecht wrote:> Hi,
> Complete newbie to R here. Just getting started reading manuals and
> playing with data.
> 
> I've managed to successfully get my *.csv files into R, however I
> have to use header=FALSE because the real header starts in line #2.
> The file format looks like:
> 
> PORTFOLIO EQUITY TABLE
> 
> TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS
> PL,DRAWDOWN,DRAWDOWN(%),EQUITY
> 
> 1,1,1/8/2004 12:57:00 PM,124.00,As
Given,1,124.00,0.00,0,"10,124.00"
> 2,1,1/14/2004 9:03:00 AM,-86.00,As
> Given,1,-86.00,86.00,0.849,"10,038.00"
> 3,1,1/14/2004 11:51:00 AM,-226.00,As
> Given,1,-226.00,312.00,3.082,"9,812.00"
> 4,1,1/15/2004 12:57:00 PM,134.00,As
> Given,1,134.00,178.00,1.758,"9,946.00"
> 
> where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest
of
> the text is on line 2, and then the lines starting with numbers are
> the real data. (Spaces added by me for email clarity only.)
> 
> If I remove the first line by hand then I can use header=TRUE and
> things work correctly, but it's not practical for me to remove the
> first line by hand on all these files every day.
> 
> I'd like to understand how I can do the read.csv but skip the first
> line. Possibly read the file, delete the first line and then send it
> to read.csv, or some other way?
> 
> Thanks in advance,
> Mark
Simply use the option "skip=1", as opposed to the default
"skip=0".
This then skips the first line of the file and only starts reading
at line 2. With "header=TRUE" (which is the default for read.csv()
anyway), the first line read in (i.e. line 2 of the file) will be
taken as the header, and the remainder as data.

You should read what it output by

  ?read.csv

One thing that may be tricky for a beginner to get their head round
is that this is *really* the help page for read.table(), and
that read.csv() is in fact a "front end" for read.table() with
different defaults.

In particular, whereas read.table() has default "header=FALSE",
read.csv() has default "header=TRUE". Also, of course, where
read.table() has sep="" (i.e. white space), read.csv() has
sep=",".

Other options for read.csv() which are not mentioned specifically
in the "usage" line for read.csv() (i.e. are subsumed in
"...")
are the same as options mentioned in the "usage" line for read.table()
and have the same defaults.

So, implicitly, "skip" is an option for read.csv() just as it is
for read.table(), and it has the same default, namely "skip=0".
So you can set it to "skip=1" just as you can for read.table()
and it will work in the same way.

This is stated in "?read.csv" to be:
    skip: integer: the number of lines of the data file to skip before
          beginning to read data.

This is potentially misleading because of the final word "data",
since a beginner might think this referred to the real data part
of the file (i.e. what follows the header), when "header=TRUE" as
in read.csv().

More explicitly, it could be written:

    skip: integer: the number of lines of the data file to skip before
          beginning to read the lines in the file.

Hoping this helps!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 28-Jun-09                                       Time: 22:37:46
------------------------------ XFMail ------------------------------

Mark Knecht

2009-Jun-28 22:25 UTC

head link

[R] read.csv, header=TRUE but skip the first line

On Sun, Jun 28, 2009 at 2:38 PM, Ted
Harding<Ted.Harding at manchester.ac.uk> wrote:
<SNIP>> More explicitly, it could be written:
>
> ? ?skip: integer: the number of lines of the data file to skip before
> ? ? ? ? ?beginning to read the lines in the file.
>
> Hoping this helps!
> Ted.
It does! More questions coming once I do some more reading.

thanks,
Mark

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jun 2009 - read.csv, header=TRUE but skip the first line

[R] read.csv, header=TRUE but skip the first line

[R] read.csv, header=TRUE but skip the first line

[R] read.csv, header=TRUE but skip the first line

[R] read.csv, header=TRUE but skip the first line

[R] read.csv, header=TRUE but skip the first line

Apparently Analagous Threads