fharrell@virginia.edu
2002-Dec-20 18:48 UTC
[Rd] read.xport and lookup.xport in foreign (PR#2385)
Under platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 6.1 year 2002 month 11 day 01 language R and using foreign 0.5-8 I am encountering errors when using read.xport. Here's code for producing SAS transport files for testing: libname x SASV5XPT "test.xpt"; libname y SASV5XPT "test2.xpt"; PROC FORMAT; VALUE race 1=green 2=blue 3=purple; RUN; PROC FORMAT CNTLOUT=format;RUN; data test; LENGTH race 3 age 4; age=30; label age="Age at Beginning of Study"; race=2; d1='3mar2002'd ; dt1='3mar2002 9:31:02'dt; t1='11:13:45't; output; age=31; race=4; d1='3jun2002'd ; dt1='3jun2002 9:42:07'dt; t1='11:14:13't; output; format d1 mmddyy10. dt1 datetime. t1 time. race race.; run; PROC COPY IN=work OUT=x;SELECT test;RUN; PROC COPY IN=work OUT=y;SELECT test format;RUN; SAS output: NOTE: Copying WORK.TEST to X.TEST (memtype=DATA). NOTE: There were 2 observations read from the data set WORK.TEST. NOTE: The data set X.TEST has 2 observations and 5 variables. NOTE: PROCEDURE COPY used: real time 1.52 seconds cpu time 0.04 seconds NOTE: Copying WORK.TEST to Y.TEST (memtype=DATA). NOTE: There were 2 observations read from the data set WORK.TEST. NOTE: The data set Y.TEST has 2 observations and 5 variables. NOTE: Copying WORK.FORMAT to Y.FORMAT (memtype=DATA). NOTE: There were 3 observations read from the data set WORK.FORMAT. NOTE: The data set Y.FORMAT has 3 observations and 21 variables. NOTE: PROCEDURE COPY used: R results:> library(foreign) > read.xport('test.xpt')RACE AGE D1 DT1 T1 1 2.000063 30.00000 15402 1330767062 40425 2 4.000063 31.00000 15494 1338716527 40453 Note the corruption of RACE (a variable having a SAS length of 3 bytes).> read.xport('test2.xpt')RACE AGE D1 DT1 T1 1 2.000063e+00 3.000000e+01 1.540200e+04 1.330767e+09 4.042500e+04 2 4.000063e+00 3.100000e+01 1.549400e+04 1.338717e+09 4.045300e+04 3 3.687825e-40 3.687825e-40 3.687825e-40 3.687896e-40 5.962240e+20 ... 124 3.835229e-93 6.434447e-86 NA 3.687825e-40 3.687825e-40 Note corrupted data when trying to read a SAS transport file containing more than one SAS dataset. According to the documentation, read.xport is supposed to work in this case and is supposed to return a list of data frames.> names(lookup.xport('test2.xpt'))[1] "TEST" Note the inclusion of only one of the 2 datasets. Also I would greatly benefit from having lookup.xport return all of the SAS variable attributes, especially variable label and format name. I could then write a little function for the community that makes read.xport as comprehensive as read.spss in terms of creating factor variables and variable labels, if the user exports the PROC CONTENTS CNTLOUT= dataset. Thanks. -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Peter Dalgaard BSA
2002-Dec-22 01:09 UTC
[Rd] read.xport and lookup.xport in foreign (PR#2385)
fharrell@virginia.edu writes: [How to create an xpt file that R cannot read...]> Also I would greatly benefit from having lookup.xport return all of > the SAS variable attributes, especially variable label and format > name. I could then write a little function for the community that > makes read.xport as comprehensive as read.spss in terms of creating > factor variables and variable labels, if the user exports the PROC > CONTENTS CNTLOUT= dataset.If anyone is interested in trying to figure this stuff out, it would be most welcome (information on the file format can be obtained via the link http://www.wotsit.org/download.asp?f=sas). To save you the trouble, here's the inverse of Frank's code, i.e., how to read the stuff back into SAS: libname x SASV5XPT "test.xpt"; libname y SASV5XPT "test2.xpt"; proc format cntlin=y.format; proc contents data=x.test; proc contents data=y.test; proc contents data=y.format; proc print data=x.test; proc print data=y.test; proc print data=y.format; Notice in particular that nothing works without the proc format line, SAS can't read test.xpt without somehow being told what the RACE format is. One possibly relevant oddity is that SAS seems to claim that RACE has length 4, not 3, in the contents listing. (Of course, if you haven't already realized: Once we know how to extract SAS format names and interpret user-supplied formats, people are going to want us to be able to interpret standard formats like DATETIME. as well....) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907