thr3ads.net - R devel - Are blank fields allowed in SAS data sets? [Jan 2000]

If this information is useful, please help other people find it:
Share via:

Douglas Bates

2000-Jan-06 23:25 UTC

Are blank fields allowed in SAS data sets?

Saikat DebRoy and I have been working on an R package that, among
other things, will read SAS data libraries in the XPORT format.  Even
though a SAS data set is of a fairly simple structure, it is a
challenge to write code to read the libraries "properly".  In fact, I
am beginning to think that file format is not well-defined.

Here is the situation:  

  - a SAS data set is a table.  The columns can be numeric (in the
    XPORT format the only numeric format allowed is the IBM mainframe
    double precision format) or character strings.  In a character
    column, all the strings are blank-padded to the same length and
    that length is included in the header information.  There is no
    terminator character for strings.  There is no record terminator
    character.
  - a SAS library file can contain more than one table.
  - the header information for each table includes the number of
    columns and the format of each column but does _not_ include the
    number of rows.  (Why not? Remember that SAS was developed at a
    time when any large amount of data was stored on punched cards or
    magnetic tape.  SAS functions as a data filter, for the most
    part.  You don't know how many rows you have until you get to the
    end and then you can't go back and change the beginning because of
    the way magnetic tape drives work.  Of course, the relevance of
    these considerations to computing resources in the year 2000 is
    questionable.) 

So how do you know when you have reached the end of one data set and
started another?  SAS always works in blocks of 80 bytes.  If the last
record in a table does not completely fill an 80 byte block, the
remainder of the block is blank-padded.  The next table will begin
with 80 bytes that must be exactly
"HEADER RECORD*******MEMBER  HEADER
RECORD!!!!!!!000000000000000001600000000140  "
except on certain VAX/VMS computers where the 140 at the end is 136.

I ran into a situation where the data were 484 rows of 8 numeric
columns.  The total length of each record is 64 bytes so the data
proper occupies 30976 bytes, not counting the headers.  This is a
total of 387 complete 80 byte blocks with 16 bytes left over.  That
last block is padded with 64 blanks.

So how do we know that these 64 blanks are not another data record?
In the case of numeric data, the particular number corresponding to 8
blanks (using the IBM mainframe floating point format, not the IEEE
format) is
> Pheno[745,]           INDIV         TIME         DOSE       WEIGHT         CONC
745 3.687825e-40 3.687825e-40 3.687825e-40 3.687825e-40 3.687825e-40
          NEWSUB     APGARLOW         TLAG
745 3.687825e-40 3.687825e-40 3.687825e-40

We can probably employ some heuristics and say that this is not a
common value so we guess that the 64 blanks are padding and not
another data record.  

But what if they had been 8 character fields, each of width 8 bytes?
Is it possible to distinguish between a blank record and blank
padding?  I can't see that it is possible.  Is there some restriction
in SAS that says you can't have a record in which all the fields are
blank?

One could pass this off as of little interest to the R community
except that this format is now a "standard" that has been adopted by
the United States Food and Drug Administration (FDA).  See
       sas.com/software/industry/pht/fda/index.html

-- 
Douglas Bates                            bates@stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        stat.wisc.edu/~bates
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Douglas Bates

2000-Jan-06 23:51 UTC

head link

Are blank fields allowed in SAS data sets?

Sorry to respond to my own post but I did a little more searching in
the Q&A document at SAS Institute regarding the FDA's adoption of the
SAS XPORT format and found they do address the issue I mentioned
and they have a solution.  The solution is "Don't do that!".  :-)

Here is the direct quote from
	sas.com/software/industry/pht/fda/faq.html

Q.     Is it difficult to detect the end of file when reading a data set stored
in
       XPORT transport format? 

   A.  Under certain circumstances it is difficult to detect the end of file
       when reading a data set in XPORT transport format. Below is an
       explanation and a suggested fix that will alleviate the problem. 

       The problem occurs when the record length of a file is less than 80
       characters. The transport format writes 80 byte records. The last
       observation may not fill up an 80 byte record. When this happens the
       SAS System will pad the 80 byte record with blanks. When reading a
       transport file, the SAS System treats trailing blanks as insignificant.
If
       the records are shorter than 80 bytes and if the last record written
       contains nothing but blank characters then the file comes up one
       record short. 

       The key to avoiding this potential problem is to undertake preventative
       measures described below. 

       One preventative measure could be to make sure that the
       circumstances do not exist. Avoid creating a data set where: 

         1.The observation length (the sum of the variable sizes) is less than
           80 and 
         2.The variables are all character and 
         3.The last record has all blank values. 

       Another preventative measure could be to save original data in
       transport format, extract new data from the transport file, then run
       PROC COMPARE to compare the original data to the new data.
       Output from PROC COMPARE can validate that the new data
       extracted from the transport file is identical to the original
       data. 
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more apparently analagous threads

R devel - Jan 2000 - Are blank fields allowed in SAS data sets?

Are blank fields allowed in SAS data sets?

Are blank fields allowed in SAS data sets?

Possibly Parallel Threads