thr3ads.net - R help - [R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" ) [Jul 2003]

If this information is useful, please help other people find it:
Share via:

cberry@tajo.ucsd.edu

2003-Jul-24 00:08 UTC

[R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )

Not sure this is a bug in R. 

Maybe its a bug in my understanding of unz(). 

The character 'b2' (hexadecimal) is in position 535 of line 1 
of 'naughty.csv'. This character appears as superscript '2' and
came to me
in an EXCEL file that I converted to text in a comma separated ( *.csv )
format.

The first line gets truncated by readLines after 534 characters using
unz():
> nchar( readLines( unz( "bad.zip", "naughty.csv" )))
[1] 534  11   9  22> nchar(readLines( pipe(" unzip -p bad.zip naughty.csv" ) ))[1] 809  11   9  22


attempting to read the same file using scan( unz( ... ) ) concat's the
rest of the file (including comma separators) to the word that included
'b2', while scan( pipe( "unzip ..." ) ) reads all elements.
>
> options(width = 50 ) # prevent my mailer from line wrapping
>
> nchar(scan(unz( "bad.zip", "naughty.csv") ,
what="a", sep=",",nlines=1))
Read 45 items
 [1]   5   9  12   8  11   4   2   1   1   8   8
[12]   8   9   5  10   8   6  12  10   8  16  16
[23]  12  14  12  20  10   8   6  12  10   8  16
[34]  16  12  14  12  20  20  18  20  18  13  13
[45] 329> nchar( scan( pipe(" unzip -p bad.zip naughty.csv" ) ,
what="a",sep=",",nlines=1) )
Read 62 items
 [1]  5  9 12  8 11  4  2  1  1  8  8  8  9  5 10
[16]  8  6 12 10  8 16 16 12 14 12 20 10  8  6 12
[31] 10  8 16 16 12 14 12 20 20 18 20 18 13 13 10
[46] 13 14 12 12 10 16 14 12 10 16 14 22 20 22 20
[61] 15 15>
> version    ## LINUX R-1.7.1 gave similar results         _                   
platform sparc-sun-solaris2.8
arch     sparc               
os       solaris2.8          
system   sparc, solaris2.8   
status                       
major    1                   
minor    7.0                 
year     2003                
month    04                  
day      16                  
language R                   > 
Chuck


Charles C. Berry                        (858) 534-2098 
                                         Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0717

Apparently Analagous Threads

Search for more maybe matching threads

R help - Jul 2003 - unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )

[R] unz( "x.zip", "y.csv" ) != pipe( "unzip -p x.zip y.csv" )

Apparently Analagous Threads

Wisdom of the Ancients