Hello, I have a seemingly simple problem that a tab-delimited file can't be read in.> annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 5933 did not have 12 elements However, all lines do have 12 columns.> lines <- readLines("matched.txt") > tabsPosns <- gregexpr("\t", lines) > table(sapply(tabsPosns, length))11 367274> system("wc -l matched.txt")367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ?> sessionInfo()R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_3.0.1 -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia
> > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 5933 did not have 12 elements > > However, all lines do have 12 columns. > > > lines <- readLines("matched.txt") > ...[many omitted lines]... > The line does not contain comment or quote characters. What can you suggest ?I suggest looking at the lines preceding the one where the error was found, with both print and cat: print(lines[5933 - (10:0)]) cat(lines[5933 - (10:0)], sep="\n") If things are not obvious after looking at them, see if read.table can read just those lines read.table(text=lines[5933 - (10:0)], sep="\t", stringsAsFactors=FALSE) If it can, try backing up more than 10 lines. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf > Of Dario Strbenac > Sent: Friday, October 04, 2013 5:01 AM > To: r-help at r-project.org > Subject: [R] Tab Separated File Reading Error > > Hello, > > I have a seemingly simple problem that a tab-delimited file can't be read in. > > > annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : > line 5933 did not have 12 elements > > However, all lines do have 12 columns. > > > lines <- readLines("matched.txt") > > tabsPosns <- gregexpr("\t", lines) > > table(sapply(tabsPosns, length)) > > 11 > 367274 > > > system("wc -l matched.txt") > 367274 matched.txt > > You can obtain the file from > https://dl.dropboxusercontent.com/u/37992150/matched.txt > > The line does not contain comment or quote characters. What can you suggest ? > > > sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 > [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > [7] base > > loaded via a namespace (and not attached): > [1] tools_3.0.1 > > -------------------------------------- > Dario Strbenac > PhD Student > University of Sydney > Camperdown NSW 2050 > Australia > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, Try: annoTranscripts<- read.csv("matched.txt", sep = '\t', stringsAsFactors = FALSE,quote="",header=FALSE) ?str(annoTranscripts) 'data.frame':??? 367274 obs. of? 12 variables: ?$ V1 : chr? "comp103529_c0_seq1" "comp129123_c0_seq1" "comp129123_c0_seq1" "comp129124_c0_seq1" ... ?$ V2 : chr? "XM_003723822" "XM_778057" "EU116908" "XM_786928" ... ?$ V3 : chr? "PREDICTED: Strongylocentrotus purpuratus neuromedin-U receptor 2-like (LOC100888633), mRNA" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L30-like (LOC577852), mRNA" "Barentsia elongata putative ribosomal protein L30 mRNA, complete cds" "PREDICTED: Strongylocentrotus purpuratus 60S ribosomal protein L29-1-like (LOC587182), mRNA" ... ?$ V4 : int? 91 392 69 149 149 451 399 203 193 185 ... ?$ V5 : int? 136 479 203 209 209 541 463 451 456 472 ... ?$ V6 : int? 15 16 40 20 20 24 20 71 83 85 ... ?$ V7 : int? 0 11 4 0 0 5 1 10 4 9 ... ?$ V8 : num? 2e-38 0e+00 6e-26 2e-70 2e-70 ... ?$ V9 : int? 1 22 210 135 135 131 189 205 196 185 ... ?$ V10: int? 136 499 410 343 343 669 650 650 649 653 ... ?$ V11: int? 576 159 27 1 1 1 21 23 140 22 ... ?$ V12: int? 441 627 227 209 209 538 483 468 593 487 ... ?dim(annoTranscripts) [1] 367274???? 12 A.K. ----- Original Message ----- From: Dario Strbenac <dstr7320 at uni.sydney.edu.au> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, October 4, 2013 8:00 AM Subject: [R] Tab Separated File Reading Error Hello, I have a seemingly simple problem that a tab-delimited file can't be read in.> annoTranscripts <- read.table("matched.txt", sep = '\t', stringsAsFactors = FALSE)Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,? : ? line 5933 did not have 12 elements However, all lines do have 12 columns.> lines <- readLines("matched.txt") > tabsPosns <- gregexpr("\t", lines) > table(sapply(tabsPosns, length))? ? 11 367274> system("wc -l matched.txt")367274 matched.txt You can obtain the file from https://dl.dropboxusercontent.com/u/37992150/matched.txt The line does not contain comment or quote characters. What can you suggest ?> sessionInfo()R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_AU.UTF-8? ? ? LC_NUMERIC=C? ? ? ? ? ? ? [3] LC_TIME=en_AU.UTF-8? ? ? ? LC_COLLATE=en_AU.UTF-8? ? [5] LC_MONETARY=en_AU.UTF-8? ? LC_MESSAGES=en_AU.UTF-8? [7] LC_PAPER=C? ? ? ? ? ? ? ? LC_NAME=C? ? ? ? ? ? ? ? [9] LC_ADDRESS=C? ? ? ? ? ? ? LC_TELEPHONE=C? ? ? ? ? ? [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C? ? ? attached base packages: [1] stats? ? graphics? grDevices utils? ? datasets? methods? [7] base? ? loaded via a namespace (and not attached): [1] tools_3.0.1 -------------------------------------- Dario Strbenac PhD Student University of Sydney Camperdown NSW 2050 Australia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.