Hi, Complete newbie to R here. Just getting started reading manuals and playing with data. I've managed to successfully get my *.csv files into R, however I have to use header=FALSE because the real header starts in line #2. The file format looks like: PORTFOLIO EQUITY TABLE TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS PL,DRAWDOWN,DRAWDOWN(%),EQUITY 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00" 2,1,1/14/2004 9:03:00 AM,-86.00,As Given,1,-86.00,86.00,0.849,"10,038.00" 3,1,1/14/2004 11:51:00 AM,-226.00,As Given,1,-226.00,312.00,3.082,"9,812.00" 4,1,1/15/2004 12:57:00 PM,134.00,As Given,1,134.00,178.00,1.758,"9,946.00" where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of the text is on line 2, and then the lines starting with numbers are the real data. (Spaces added by me for email clarity only.) If I remove the first line by hand then I can use header=TRUE and things work correctly, but it's not practical for me to remove the first line by hand on all these files every day. I'd like to understand how I can do the read.csv but skip the first line. Possibly read the file, delete the first line and then send it to read.csv, or some other way? Thanks in advance, Mark
Mark Knecht wrote:> Hi, > Complete newbie to R here. Just getting started reading manuals and > playing with data. > > I've managed to successfully get my *.csv files into R, however I > have to use header=FALSE because the real header starts in line #2. > The file format looks like: > > PORTFOLIO EQUITY TABLE > > TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS > PL,DRAWDOWN,DRAWDOWN(%),EQUITY > > 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00" > 2,1,1/14/2004 9:03:00 AM,-86.00,As Given,1,-86.00,86.00,0.849,"10,038.00" > 3,1,1/14/2004 11:51:00 AM,-226.00,As Given,1,-226.00,312.00,3.082,"9,812.00" > 4,1,1/15/2004 12:57:00 PM,134.00,As Given,1,134.00,178.00,1.758,"9,946.00" > > where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of > the text is on line 2, and then the lines starting with numbers are > the real data. (Spaces added by me for email clarity only.) > > If I remove the first line by hand then I can use header=TRUE and > things work correctly, but it's not practical for me to remove the > first line by hand on all these files every day. > > I'd like to understand how I can do the read.csv but skip the first > line. Possibly read the file, delete the first line and then send it > to read.csv, or some other way?Does it not work to set skip=1 in read.csv?? (With names like that, you might want check.names=FALSE too) -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Hey Mark, What about something like read.csv(file="yourfile", skip=1, header=TRUE) Hope that helps, Joshua On Sun, Jun 28, 2009 at 2:05 PM, Mark Knecht<markknecht at gmail.com> wrote:> Hi, > ? Complete newbie to R here. Just getting started reading manuals and > playing with data. > > ? I've managed to successfully get my *.csv files into R, however I > have to use header=FALSE because the real header starts in line #2. > The file format looks like: > > PORTFOLIO EQUITY TABLE > > TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS > PL,DRAWDOWN,DRAWDOWN(%),EQUITY > > 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00" > 2,1,1/14/2004 9:03:00 AM,-86.00,As Given,1,-86.00,86.00,0.849,"10,038.00" > 3,1,1/14/2004 11:51:00 AM,-226.00,As Given,1,-226.00,312.00,3.082,"9,812.00" > 4,1,1/15/2004 12:57:00 PM,134.00,As Given,1,134.00,178.00,1.758,"9,946.00" > > where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of > the text is on line 2, and then the lines starting with numbers are > the real data. (Spaces added by me for email clarity only.) > > If I remove the first line by hand then I can use header=TRUE and > things work correctly, but it's not practical for me to remove the > first line by hand on all these files every day. > > I'd like to understand how I can do the read.csv but skip the first > line. Possibly read the file, delete the first line and then send it > to read.csv, or some other way? > > Thanks in advance, > Mark > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Junior in Psychology University of California, Riverside http://www.joshuawiley.com/
On 28-Jun-09 21:05:59, Mark Knecht wrote:> Hi, > Complete newbie to R here. Just getting started reading manuals and > playing with data. > > I've managed to successfully get my *.csv files into R, however I > have to use header=FALSE because the real header starts in line #2. > The file format looks like: > > PORTFOLIO EQUITY TABLE > > TRADE,MARK-SYS,DATE/TIME,PL/SIZE,PS METHOD,POS SIZE,POS > PL,DRAWDOWN,DRAWDOWN(%),EQUITY > > 1,1,1/8/2004 12:57:00 PM,124.00,As Given,1,124.00,0.00,0,"10,124.00" > 2,1,1/14/2004 9:03:00 AM,-86.00,As > Given,1,-86.00,86.00,0.849,"10,038.00" > 3,1,1/14/2004 11:51:00 AM,-226.00,As > Given,1,-226.00,312.00,3.082,"9,812.00" > 4,1,1/15/2004 12:57:00 PM,134.00,As > Given,1,134.00,178.00,1.758,"9,946.00" > > where the words "PORTFOLIO EQUITY TABLE" make up line 1, the rest of > the text is on line 2, and then the lines starting with numbers are > the real data. (Spaces added by me for email clarity only.) > > If I remove the first line by hand then I can use header=TRUE and > things work correctly, but it's not practical for me to remove the > first line by hand on all these files every day. > > I'd like to understand how I can do the read.csv but skip the first > line. Possibly read the file, delete the first line and then send it > to read.csv, or some other way? > > Thanks in advance, > MarkSimply use the option "skip=1", as opposed to the default "skip=0". This then skips the first line of the file and only starts reading at line 2. With "header=TRUE" (which is the default for read.csv() anyway), the first line read in (i.e. line 2 of the file) will be taken as the header, and the remainder as data. You should read what it output by ?read.csv One thing that may be tricky for a beginner to get their head round is that this is *really* the help page for read.table(), and that read.csv() is in fact a "front end" for read.table() with different defaults. In particular, whereas read.table() has default "header=FALSE", read.csv() has default "header=TRUE". Also, of course, where read.table() has sep="" (i.e. white space), read.csv() has sep=",". Other options for read.csv() which are not mentioned specifically in the "usage" line for read.csv() (i.e. are subsumed in "...") are the same as options mentioned in the "usage" line for read.table() and have the same defaults. So, implicitly, "skip" is an option for read.csv() just as it is for read.table(), and it has the same default, namely "skip=0". So you can set it to "skip=1" just as you can for read.table() and it will work in the same way. This is stated in "?read.csv" to be: skip: integer: the number of lines of the data file to skip before beginning to read data. This is potentially misleading because of the final word "data", since a beginner might think this referred to the real data part of the file (i.e. what follows the header), when "header=TRUE" as in read.csv(). More explicitly, it could be written: skip: integer: the number of lines of the data file to skip before beginning to read the lines in the file. Hoping this helps! Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 28-Jun-09 Time: 22:37:46 ------------------------------ XFMail ------------------------------
On Sun, Jun 28, 2009 at 2:38 PM, Ted Harding<Ted.Harding at manchester.ac.uk> wrote: <SNIP>> More explicitly, it could be written: > > ? ?skip: integer: the number of lines of the data file to skip before > ? ? ? ? ?beginning to read the lines in the file. > > Hoping this helps! > Ted.It does! More questions coming once I do some more reading. thanks, Mark
Possibly Parallel Threads
- newbie - read.csv creates a (data.frame, table, array, matrix, ...) and plotting one column
- Convention difference in tseries.maxdrawdown (PR#8872)
- Testing Specific Hypothesis
- block statistics with POSIX classes
- "reaper" is not picking up new changes to my application???