ruskin@iba.com.hk
2004-Nov-23  11:27 UTC
[Rd] Problem with read.xport() from foreigh package (PR#7389)
Full_Name: Ruskin Chow Version: R 2.0.1 OS: Windows 2000 Submission from: (NULL) (203.169.154.66) Data imported from SAS using read.xport() in package foreign are converted to <NA> when the SAS data field consists of character strings that are only one character long. This is apparently a previously reported bug and perhaps fixed in some platform other than Windows (rw2001).Some discussion of the bug can be found from the following website: https://stat.ethz.ch/pipermail/r-help/2002-April/019349.html I've downloaded the latest foreign package (foreign_0.8-1.zip) from CRAN but it doesn't seem to work. Thanks in advance for throwing light if I missed something here.
ripley@stats.ox.ac.uk
2004-Nov-23  12:02 UTC
[Rd] Problem with read.xport() from foreigh package (PR#7389)
On Tue, 23 Nov 2004 ruskin@iba.com.hk wrote:> Full_Name: Ruskin Chow > Version: R 2.0.1 > OS: Windows 2000 > Submission from: (NULL) (203.169.154.66) > > > Data imported from SAS using read.xport() in package foreign are converted to > <NA> when the SAS data field consists of character strings that are only one > character long. > > This is apparently a previously reported bug and perhaps fixed in some platform > other than Windows (rw2001).Some discussion of the bug can be found from the > following website: > > https://stat.ethz.ch/pipermail/r-help/2002-April/019349.htmlThere is no Windows-specific version of read.xport(), so that change happened everywhere: but that message does not say it was fixed nor which change might have fixed it. I am guessing the change is this one 2002-03-27 Douglas Bates <bates@stat.wisc.edu> * src/SASxport.c (IS_SASNA_CHAR): Silly typo (0x4l, not 0x41) caught by Peter. and that does not sound like the same thing.> I've downloaded the latest foreign package (foreign_0.8-1.zip) from CRAN but it > doesn't seem to work.^^^^^^^^^^^^^^^^^^^^ What does that mean? It `works' on its test suite: none of the changes since 0.8 are related to this question. I don't believe there is an example in the test suite with one-char strings, and we need an example to reproduce what you are seeing. So, please read the posting guide and provide us with a reproducible example. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
ripley@stats.ox.ac.uk
2004-Nov-24  16:40 UTC
[Rd] Problem with read.xport() from foreigh package (PR#7389)
The relevant part of the code is
 		    if(strlen(tmpchar) == 1 && IS_SASNA_CHAR(tmpchar[0]))
#define IS_SASNA_CHAR(c) ((c) == 0x5f || (c) == 0x2e || \
                           (0x41 <= (c) && (c) <= 0x5a))
which says that single-character fields containing ., _, A-Z are to be 
taken as missing values.  That is true of all single-character fields in 
this file.
Looking at the reference, it says
   Missing values are written out with the first byte (the exponent)
   indicating the proper missing values. All subsequent bytes are 0x00. The
   first byte is:
      type      byte
       ._       0x5f
       .        0x2e
       .A       0x41
       .B       0x42
          ....
       .Z       0x5a
which suggests this is intended to apply only to numeric records 
('exponent'), whereas R applies it only to character records.  Elsewhere
I
found
   SAS stores 'missing' data depending on the variables data type:
   character or numeric. A 'missing' character value is represented as a
   blank (' ') or null (''). A 'missing' numeric value
is represented as a
   single dot (.). SAS allows you to differentiate between values that are
   missing for different reasons. For example, in survey research a
   question may not be answered because the respondent refuses to answer or
   because they don't know the answer. SAS has a range of missing values to
   cover this case. These special missing values are for numerics only and
   are the letters of the alphabet preceded by the single dot .A thru .Z
which seems to confirm this.
Does anyone know for sure?
On Wed, 24 Nov 2004, Ruskin Chow wrote:
> Dear Prof. Ripley,
>
> Thanks for your prompt reply. In fact when I converted one-char data fields
> in SAS to two-char fields before generating the SAS transport format file,
> the problem (that all single char fields become <NA>) disappears. The
> problem is readily reproducible and I attach two sample data files in SAS
> transport format, one with one-char data fields ("char1.xport")
and the
> other data file ("char2.xport") with the same data fields except
that
> one-char fields are converted to two-char fields (for example, see the
> fields named "SEX", "GB" etc.). The R output text file
("lastsave.txt") is
> also attached so that you can verify it easily at your end.
>
> Even with the above, there is no conclusive evidence whether this problem
is
> actually (a) with SAS in generating the transport file, or (b) with R in
the
> read.xport() function. However, with the indication in the following
> discussion thread:
>
> http://maths.newcastle.edu.au/~rking/R/help/02a/3258.html
>
> I tend to believe that R may have the answer.
>
> Hope this helps and thanks again for your quick response.
>
> Best regards,
>
> Ruskin
>
> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley@stats.ox.ac.uk]
> Sent: Tuesday, November 23, 2004 7:02 PM
> To: ruskin@iba.com.hk
> Cc: R-bugs@biostat.ku.dk
> Subject: Re: [Rd] Problem with read.xport() from foreigh package (PR#7389)
>
> On Tue, 23 Nov 2004 ruskin@iba.com.hk wrote:
>
>> Full_Name: Ruskin Chow
>> Version: R 2.0.1
>> OS: Windows 2000
>> Submission from: (NULL) (203.169.154.66)
>>
>>
>> Data imported from SAS using read.xport() in package foreign are
converted
> to
>> <NA> when the SAS data field consists of character strings that
are only
> one
>> character long.
>>
>> This is apparently a previously reported bug and perhaps fixed in some
> platform
>> other than Windows (rw2001).Some discussion of the bug can be found
from
> the
>> following website:
>>
>> https://stat.ethz.ch/pipermail/r-help/2002-April/019349.html
>
> There is no Windows-specific version of read.xport(), so that change
> happened everywhere: but that message does not say it was fixed nor which
> change might have fixed it.  I am guessing the change is this one
>
> 2002-03-27  Douglas Bates  <bates@stat.wisc.edu>
>
>         * src/SASxport.c (IS_SASNA_CHAR): Silly typo (0x4l, not 0x41)
>         caught by Peter.
>
> and that does not sound like the same thing.
>
>> I've downloaded the latest foreign package (foreign_0.8-1.zip) from
CRAN
> but it
>> doesn't seem to work.
>   ^^^^^^^^^^^^^^^^^^^^
>
> What does that mean?  It `works' on its test suite: none of the changes
> since 0.8 are related to this question.  I don't believe there is an
> example in the test suite with one-char strings, and we need an example to
> reproduce what you are seeing.
>
> So, please read the posting guide and provide us with a reproducible
> example.
>
> --
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
Werner Engl
2004-Dec-09  14:33 UTC
[Rd] Problem with read.xport() from foreigh package (PR#7389)
Dear R-devel list,
This is to confirm Prof. Ripley's analysis of the
read.xport issue.
The section on missing data in TS140 is pertinent
to numeric variables only. In SAS, character 
variables are of fixed length (between 1 and 200 
for the xport format). Shorter strings are padded 
with trailing blanks when assigned to a variable.
An uninitialized character variable is stored as 
all blanks in the xport format file. This is the 
only representation of 'missing' data for SAS 
character variables. 'Special missing' codes 
(.A to .Z and ._) are available for numeric 
variables only.
Please find enclosed a patch to the 
R-2.0.1/src/library/Recommended/foreign/SASxport.c
file and a xport file that I used for testing. The
xport file was created by SAS V8.2 on Linux, but 
should be plattform and version independent (except
for the header information). I have simply commented
out the code lines that try to detect missing character
values.
The code in SASxport.c already does a good job in 
removing trailing blanks from character values. 
For missing character data (all blanks) the result 
is the empty string (""), which is fine for me. 
There is no equivalent to the R missing character 
representation in SAS (as far as I know). 
The enclosed gzipped tar file contains:
diff_SASxport_c.txt	diff for SASxport.c
xptchar1.xpt	test file in xport format
xptchar.sas	trivial SAS program used to 
	generate xptchar1.xpt
xptchar_SAS_System_Viewer9_1.csv	xptchar1.xpt 
	converted to comma separated file using SAS 
	System Viewer 9.1 (on Win XP)
With the patch applied, read.xport produces the same 
data frame from xptchar1.xpt as read.csv does from 
xptchar_SAS_System_Viewer9_1.csv (tested on i386 Linux 
with R Version 2.0.1) except that read.csv converts empty 
strings to NAs. As explained above, the empty string is
closer to the meaning of an all-blanks value in SAS.
There is renewed interest in this old data format in 
the pharmaceutical industry, because the US Food and 
Drug Administration requests clinical and 
pre-clinical data to be submitted in this format. I 
spent some time analyzing the xport file format to 
be sure of what is actually submitted to FDA with 
these files.
Thank you for considering this patch (and for the 
great R system, of course)!
Best regards,
Werner Engl
_____________________________________
Werner Engl, PhD, CStat            
Senior Manager, Biostatistics                             
Baxter AG, Vienna, Austria
e-mail: werner_engl@baxter.com
--- Please disregard any text below this line ---
-- 
GMX DSL-Netzanschluss + Tarif zum supergünstigen Komplett-Preis!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PR7389_we20041209.tar.gz
Type: application/x-gzip
Size: 1727 bytes
Desc: not available
Url :
https://stat.ethz.ch/pipermail/r-devel/attachments/20041209/0f297125/PR7389_we20041209.tar.gz