An unmatched quote can make read.table run very slowly
when there are lots of lines in the file. E.g.,> z <- rep("A B C", 10^6)
> z[2] <- "A \"B C" # unmatched quote on line 2
> tf <- tempfile()
> cat(file=tf, sep="\n", z)
> system.time(z2 <- read.table(tf, skip=2)) # skip bad line
user system elapsed
0.860 0.028 0.887> str(z2)
'data.frame': 999998 obs. of 3 variables:
$ V1: Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
$ V2: Factor w/ 1 level "B": 1 1 1 1 1 1 1 1 1 1 ...
$ V3: Factor w/ 1 level "C": 1 1 1 1 1 1 1 1 1 1
...> system.time(z1 <- read.table(tf, skip=1))
[ no return for several minutes on a 64-bit Linux machine ]
On smaller files it quickly gives the error "line 1 did not have 4
elements",
along with a warning "incomplete final line found by readTableHeader
...".
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On Behalf
> Of Rich Shepard
> Sent: Wednesday, August 01, 2012 10:52 AM
> To: r-help at r-project.org
> Subject: [R] read.table() Issue
>
> Yesterday I changed the headers for a couple of columns in data text
files
> and removed hyphens from within character strings, too. When I tried to
> re-read these data sources using read.table() I encountered an issue
I've
> not before seen. Both files were read almost instantly until
yesterday's
> wording changes.
>
> Now both files seem to cause R to hang. Rather than having the prompt
> immediately returned nothing happens. In emacs the 'working' symbol
appears
> but the read.table() function does not complete.
>
> What might cause this?
>
> Rich
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.