Hi, I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format. Toy example: label_1, label_2, label_3 1,2,3 3,2,4 2,3,4 Total Rows: 3 When I try to import this into R with: d <- read.table("foo.csv", header=T, sep=",") It fails to import properly because of the last line. Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly. I don't like this extra layer of processing. Is there a way to import something like this cleanly in R. Thanks! -- Noah
Hi, On Sun, Feb 12, 2012 at 7:05 PM, Noah Silverman <noahsilverman at ucla.edu> wrote:> Hi, > > I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format. > > Toy example: > > label_1, label_2, label_3 > 1,2,3 > 3,2,4 > 2,3,4 > Total Rows: 3 > > When I try to import this into R with: ?d <- read.table("foo.csv", header=T, sep=",") > It fails to import properly because of the last line. > > Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly. ?I don't like this extra layer of processing. > > Is there a way to import something like this cleanly in R.This is arguably the file's problem, so I'm not sure how many "clean" solutions you will find, but one thing you can do is perhaps count the number of lines in the file, then set the `nrows` argument in your call to read.table to be 1 less than that. How to count the lines, though? Assuming you're on *nix (or have cygwin), you can do something like: N <- system("wc -l /path/to/file.csv") (you'll have to do some parsing on N) You could also first call `readLines` and find the length of the result, but this would require you to read the file twice, so ... pick your poison. Too bad the person authoring the file doesn't prefix those lines with some comment character ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
This works for me: Lines <- "label_1, label_2, label_3 1,2,3 3,2,4 2,3,4 Total Rows: 3" d <- head(read.csv(textConnection(Lines)), -1) closeAllConnections() On Sun, Feb 12, 2012 at 10:05 PM, Noah Silverman <noahsilverman@ucla.edu>wrote:> Hi, > > I have a CSV file that is formatted well, except that the last line is a > "summary" not is CSV format. > > Toy example: > > label_1, label_2, label_3 > 1,2,3 > 3,2,4 > 2,3,4 > Total Rows: 3 > > > When I try to import this into R with: d <- read.table("foo.csv", > header=T, sep=",") > It fails to import properly because of the last line. > > Currently, I have a shell script that strips the last line from the file, > then it imports to R cleanly. I don't like this extra layer of processing. > > Is there a way to import something like this cleanly in R. > > Thanks! > > -- > Noah > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O [[alternative HTML version deleted]]
On 13/02/12 13:05, Noah Silverman wrote:> Hi, > > I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format. > > Toy example: > > label_1, label_2, label_3 > 1,2,3 > 3,2,4 > 2,3,4 > Total Rows: 3 > > > When I try to import this into R with: d<- read.table("foo.csv", header=T, sep=",") > It fails to import properly because of the last line. > > Currently, I have a shell script that strips the last line from the file, then it imports to R cleanly. I don't like this extra layer of processing. > > Is there a way to import something like this cleanly in R.How clean is clean? You need to count the number of lines in the file, and then set the nrows argument of read.csv() to be two less. (*Two* r.t. one, because of the header.) Counting the lines --- three possibilities that I can see: (1) nlines() from the "parser" package (2) countLines() from the "R.utils" package (3) brute force: x <- readLines(<filename>) n <- length(x) Having determined n, do: y <- read.csv(<filename>,nrows=n-2) cheers, Rolf Turner
I believe this should work d <- read.table("foo.csv", header=T, sep=",", comment="T") although its spitting back a warning... this used to work for me. Noah Silverman wrote> > Hi, > > I have a CSV file that is formatted well, except that the last line is a > "summary" not is CSV format. > > Toy example: > > label_1, label_2, label_3 > 1,2,3 > 3,2,4 > 2,3,4 > Total Rows: 3 > > > When I try to import this into R with: d <- read.table("foo.csv", > header=T, sep=",") > It fails to import properly because of the last line. > > Currently, I have a shell script that strips the last line from the file, > then it imports to R cleanly. I don't like this extra layer of > processing. > > Is there a way to import something like this cleanly in R. > > Thanks! > > -- > Noah > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/Reading-in-csv-with-footer-tp4382441p4382980.html Sent from the R help mailing list archive at Nabble.com.