Li, Xiaochun
2013-Nov-19 21:40 UTC
[R] Does function read.sas7bdat() have some memory limitations?
Dear R-ers, I was trying to read in a large sas7bdat file (size 148094976 bytes) using 'read.sas7bdat()', but it did not read in the data correctly. E.g., the first 5 rows will come out like this (I'm omitting other columns to keep it readable): PERSON_ID age 1 5.399114e-315 5.329436e-315 2 5.399114e-315 5.328302e-315 3 5.399114e-315 5.332026e-315 4 5.399114e-315 5.329112e-315 5 5.399114e-315 5.331055e-315 If I reduced the original sas dataset to the first 5 rows, 'read.sas7bdat' read them in correctly: PERSON_ID age 1 612569 55 2 612571 48 3 612580 78 4 612606 53 5 612617 66 So for now I first saved the sas dataset as .csv, then read using 'read.csv', everything is fine. Any suggestion why 'read.sas7bdat' didn't work, and if some fix in its code can make it work? Thank you. _____________________________ Xiaochun Li, Ph.D. Department of Biostatistics Indiana University School of Medicine and Richard M. Fairbanks School of Public Health
peter dalgaard
2013-Nov-21 14:18 UTC
[R] Does function read.sas7bdat() have some memory limitations?
This certainly looks like a bug, and there are many ways of inducing bugs that only show up with large datasets - buffer overruns, fields that are too small to hold the number of rows, etc. Remember that there is NO official documentation of the .sas7bdat format, everything has been reverse engineered, and if something in the format is different for very large datasets, it may well have gone unnoticed. However, read.sas7bdat is from the sas7bdat package which has a maintainer. It is not unlikely that he is interested in tracking down the root cause, if you show him how to generate SAS datasets that reproduce the issue. Best, Peter D. On 19 Nov 2013, at 22:40 , Li, Xiaochun <xiaochun at iupui.edu> wrote:> Dear R-ers, > > I was trying to read in a large sas7bdat file (size 148094976 bytes) using 'read.sas7bdat()', but it did not read in the data correctly. E.g., the first 5 rows will come out like this (I'm omitting other columns to keep it readable): > > PERSON_ID age > 1 5.399114e-315 5.329436e-315 > 2 5.399114e-315 5.328302e-315 > 3 5.399114e-315 5.332026e-315 > 4 5.399114e-315 5.329112e-315 > 5 5.399114e-315 5.331055e-315 > > If I reduced the original sas dataset to the first 5 rows, 'read.sas7bdat' read them in correctly: > > PERSON_ID age > 1 612569 55 > 2 612571 48 > 3 612580 78 > 4 612606 53 > 5 612617 66 > > So for now I first saved the sas dataset as .csv, then read using 'read.csv', everything is fine. > > Any suggestion why 'read.sas7bdat' didn't work, and if some fix in its code can make it work? > > Thank you. > _____________________________ > Xiaochun Li, Ph.D. > Department of Biostatistics > Indiana University > School of Medicine and > Richard M. Fairbanks School of Public Health > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com