manikandan_narayanan at merck.com
2009-Mar-27 02:18 UTC
[Rd] read.table on long lines buggy (PR#13626)
Full_Name: Manikandan Narayanan Version: 2.8.1 OS: linux-gnu Submission from: (NULL) (155.91.28.231) Hi R-folks, I have two three-line text files: tst1, tst2 (they are the same except that the second line is longer in tst1; see cat() cmds below). read.table is only able to read the 3rd line in tst1, however reads tst2 correctly as shown below. This happens both in R 2.5.1 (windows) and R 2.8.1 (linux-gnu). Seems to be an issue with read.table operating on long lines. It caused me quite some trouble before uncovering this one from reading a bigger file I had! Please take care of this one or suggest me safer ways of working with long lines! Thanks, Mani> cat(file="tst1", "a:15S_RRNA, 21S_RRNA, AAC1, AAC3\nb:AAP1, ACN9, ALG1, ALG11,ALG12, ALG13, ALG14, ALG2, ALG3, ALG5, ALG6, ALG7, ALG8, ALG9, AMS1, ANP1, ARA1, ATH1, BCH1, BCH2, BMH1, BMH2, BNI4, BUD7, CAX4, CDC19, CHS3, CHS5, CHS6, CHS7, CIT2, CTS1, CWH41, DDP1, DIE2, DIP5, DLD1, DOG1, DOG2, DPM1, ELM1, ENO1, ENO2, EOS1, ERD1, EXG1, FBA1, FBP1, FBP26, FDH1, FKS1, GAC1, GAL1, GAL10, GAL2, GAL3, GAL4, GAL7, GAL80, GCY1, GDA1, GDB1, GFA1, GIP2, GLC3, GLC7, GLC8, GLG1, GLG2, GLK1, GLO2, GLO4, GNA1, GND1, GND2, GNT1, GPH1, GPM1, GRE3, GSC2, GSY1, GSY2, GTB1, GUT2, HAP4, HKR1, HOC1, HOR2, HPF1, HXK1, HXK2, HXT4, ICL1, IMP2', INM1, INM2, ITR1, KAR2, KEG1, KNH1, KRE2, KRE5\nc:ABC1")> read.table("tst1", sep=":", stringsAsFactors=F)[,1][1] "c" Warning message: In read.table("tmp1", sep = ":", stringsAsFactors = F) : incomplete final line found by readTableHeader on 'tmp1'> cat(file="tst2", "a:15S_RRNA, 21S_RRNA, AAC1, AAC3\nb:AAP1, ACN9, ALG1, ALG11,ALG12, ALG13, ALG14, ALG2, ALG3, ALG5, ALG6, ALG7, ALG8, ALG9, AMS1, ANP1, ARA1, ATH1, BCH1, BCH2, BMH1, BMH2, BNI4, BUD7, CAX4, CDC19, CHS3, CHS5, CHS6, CHS7, CIT2, CTS1, CWH41, DDP1, DIE2, DIP5, DLD1, DOG1, DOG2, DPM1, ELM1, ENO1, ENO2, EOS1, ERD1, EXG1, FBA1, FBP1, FBP26, FDH1, FKS1, GAC1, GAL1, GAL10, GAL2, GAL3, GAL4, GAL7, GAL80, GCY1, GDA1, GDB1, GFA1, GIP2, GLC3, GLC7, GLC8, GLG1, GLG2, GLK1, GLO2, GLO4, GNA1, GND1, GND2, GNT1, GPH1\nc:ABC1\n")> read.table("tst2", sep=":", stringsAsFactors=F)[,1][1] "a" "b" "c"