cberry@tajo.ucsd.edu
2003-Jul-24 00:08 UTC
[R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )
Not sure this is a bug in R. Maybe its a bug in my understanding of unz(). The character 'b2' (hexadecimal) is in position 535 of line 1 of 'naughty.csv'. This character appears as superscript '2' and came to me in an EXCEL file that I converted to text in a comma separated ( *.csv ) format. The first line gets truncated by readLines after 534 characters using unz():> nchar( readLines( unz( "bad.zip", "naughty.csv" )))[1] 534 11 9 22> nchar(readLines( pipe(" unzip -p bad.zip naughty.csv" ) ))[1] 809 11 9 22 attempting to read the same file using scan( unz( ... ) ) concat's the rest of the file (including comma separators) to the word that included 'b2', while scan( pipe( "unzip ..." ) ) reads all elements.> > options(width = 50 ) # prevent my mailer from line wrapping > > nchar(scan(unz( "bad.zip", "naughty.csv") , what="a", sep=",",nlines=1)) Read 45 items [1] 5 9 12 8 11 4 2 1 1 8 8 [12] 8 9 5 10 8 6 12 10 8 16 16 [23] 12 14 12 20 10 8 6 12 10 8 16 [34] 16 12 14 12 20 20 18 20 18 13 13 [45] 329> nchar( scan( pipe(" unzip -p bad.zip naughty.csv" ) , what="a",sep=",",nlines=1) ) Read 62 items [1] 5 9 12 8 11 4 2 1 1 8 8 8 9 5 10 [16] 8 6 12 10 8 16 16 12 14 12 20 10 8 6 12 [31] 10 8 16 16 12 14 12 20 20 18 20 18 13 13 10 [46] 13 14 12 12 10 16 14 12 10 16 14 22 20 22 20 [61] 15 15> > version ## LINUX R-1.7.1 gave similar results_ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major 1 minor 7.0 year 2003 month 04 day 16 language R>Chuck Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://hacuna.ucsd.edu/members/ccb.html La Jolla, San Diego 92093-0717