don@delphioutpost.com
2003-May-17 17:03 UTC
[Rd] read.table fails with \246 separator (PR#3035)
Full_Name: Don Allen Version: 1.6.2 OS: Solaris Submission from: (NULL) (140.186.148.11) If you use '\246' to separate fields in a csv-like file, read.table fails if you have more than 5 lines in the file (in the following, the separators in junk.csv are really '\246's, despite the way they printed): Fails:> read.table("/tmp/junk.csv",as.is=T,header=T,sep="\246")Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 5 did not have 5 elements junk.csv ---------------- x¦a¦b¦c¦d 1¦7¦13¦19¦25 2¦8¦14¦20¦26 3¦9¦15¦21¦27 4¦10¦16¦22¦28 5¦11¦17¦23¦29 6¦12¦18¦24¦30 ---------------- That works if you delete the last two lines:> read.table("/tmp/junk.csv",as.is=T,header=T,sep="\246")x a b c d 1 1 7 13 19 25 2 2 8 14 20 26 3 3 9 15 21 27 4 4 10 16 22 28 When using tabs or vertical bars as separators, you do not encounter this problem. The suspicion, of course, is that this has something to do with using a separator that has the high-order bit set (Insightful introduced just such a bug in Splus 6.1 that completely breaks their read.table for such separators).
Prof Brian Ripley
2003-May-17 18:12 UTC
[Rd] read.table fails with \246 separator (PR#3035)
It transpires this is not to do with read.table: scan fails on your example and it is in scan that a character is being compared with an unsigned char after each has been coerced to int. It's of long standing (but 1.6.2 is not current, and please do check on the current version). It will be fixed for 1.7.1. On Sat, 17 May 2003 don@delphioutpost.com wrote:> Full_Name: Don Allen > Version: 1.6.2 > OS: Solaris > Submission from: (NULL) (140.186.148.11) > > > If you use '\246' to separate fields in a csv-like file, read.table fails if > you > have more than 5 lines in the file (in the following, the separators in junk.csv > are really '\246's, despite the way they printed): > > Fails: > > > > read.table("/tmp/junk.csv",as.is=T,header=T,sep="\246") > > Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : > > line 5 did not have 5 elements > > junk.csv > ---------------- > x?a?b?c?d > 1?7?13?19?25 > 2?8?14?20?26 > 3?9?15?21?27 > 4?10?16?22?28 > 5?11?17?23?29 > 6?12?18?24?30 > ---------------- > > That works if you delete the last two lines: > > > > read.table("/tmp/junk.csv",as.is=T,header=T,sep="\246") > > x a b c d > 1 1 7 13 19 25 > 2 2 8 14 20 26 > 3 3 9 15 21 27 > 4 4 10 16 22 28 > > When using tabs or vertical bars as separators, you do not encounter this > problem. The suspicion, of course, is that this has something to do with using a > separator > that has the high-order bit set (Insightful introduced just such a bug in Splus > 6.1 > that completely breaks their read.table for such separators). > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > >-- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595