thr3ads.net - R help - [R] Getting information encoded in a SAS, SPSS or Stata command file into R. [Nov 2012]

If this information is useful, please help other people find it:
Share via:

andrewH

2012-Nov-13 04:23 UTC

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Dear folks ?
I have a large (26 gig) ASCII flat file in fixed-width format with about 10
million observations of roughly 400 variables.  (It is 51 years of Current
Population Survey micro data from IPUMS, roughly half the fields for each
record).  The file was produced by automatic process in response to a data
request of mine. 

The file is not accompanied by a human-readable file giving the fieldnames
and starting positions for each field.  Instead it comes with three command
files that describe the file, one each for SAS SPSS, and Stata. I do not
have ready access to any of these programs.  I understand that these files
also include the equivalent of the levels attribute for the coded data.  I
might be able to hand-extract the information I need from the command files,
but this would involve days of tedious work that I am hoping to avoid.

I have read through the R Data Import/Export manual 2 and the foreign
package documentation and I do not see anything that would allow me to
extract the necessary information from these command files. Does anyone know
of any r package or other non-proprietary tools that would allow me to get
this data set from its current form into any of the following formats:
SAS, SPSS or Stata binary files read by R.
A MySQL data base
An ffdf object readable using the ff package.

My ultimate goal is to get the data into an ffdf object so that I can
manipulate it in R, perhaps by way of a database. In allocation I will
probably be using no more than 20 variables at a time, probably a bit under
a gig. I am working on a machine with three gig of ram. 

(I have seen some suggestions that data.table also provides a
memory-efficient way of providing database-like functions, but I am unsure
whether it would let me cope with an object of this size).

Any help or suggestions anyone could offer would be very much appreciated.

Warmest regards, andrewH




--
View this message in context:
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353.html
Sent from the R help mailing list archive at Nabble.com.

Jan

2012-Nov-13 11:38 UTC

head link

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Hi,

If it is your objective to get your data in an ffdf, I suggest you look at
the SAS/SPSS/Stata code to see where each column is starting, next try out
the  LaF <http://cran.r-project.org/web/packages/LaF/index.html>   package
as it allows you to read in large fixed width format files and once you have
this up and running, you can use the laf_to_ffdf function at the  ffbase
<http://cran.r-project.org/web/packages/ffbase/index.html>   package which
works well with the LaF package and allows you import the flat file
immediately into an ffdf for further transactions.

hope that helps,
Jan



--
View this message in context:
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649367.html
Sent from the R help mailing list archive at Nabble.com.

Ista Zahn

2012-Nov-13 15:08 UTC

head link

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Hi Andrew,

You may be able to run the SPSS syntax file using pspp
(http://www.gnu.org/software/pspp/)

Best,
Ista

On Mon, Nov 12, 2012 at 11:23 PM, andrewH <ahoerner at rprogress.org>
wrote:> Dear folks ?
> I have a large (26 gig) ASCII flat file in fixed-width format with about 10
> million observations of roughly 400 variables.  (It is 51 years of Current
> Population Survey micro data from IPUMS, roughly half the fields for each
> record).  The file was produced by automatic process in response to a data
> request of mine.
>
> The file is not accompanied by a human-readable file giving the fieldnames
> and starting positions for each field.  Instead it comes with three command
> files that describe the file, one each for SAS SPSS, and Stata. I do not
> have ready access to any of these programs.  I understand that these files
> also include the equivalent of the levels attribute for the coded data.  I
> might be able to hand-extract the information I need from the command
files,
> but this would involve days of tedious work that I am hoping to avoid.
>
> I have read through the R Data Import/Export manual 2 and the foreign
> package documentation and I do not see anything that would allow me to
> extract the necessary information from these command files. Does anyone
know
> of any r package or other non-proprietary tools that would allow me to get
> this data set from its current form into any of the following formats:
> SAS, SPSS or Stata binary files read by R.
> A MySQL data base
> An ffdf object readable using the ff package.
>
> My ultimate goal is to get the data into an ffdf object so that I can
> manipulate it in R, perhaps by way of a database. In allocation I will
> probably be using no more than 20 variables at a time, probably a bit under
> a gig. I am working on a machine with three gig of ram.
>
> (I have seen some suggestions that data.table also provides a
> memory-efficient way of providing database-like functions, but I am unsure
> whether it would let me cope with an object of this size).
>
> Any help or suggestions anyone could offer would be very much appreciated.
>
> Warmest regards, andrewH
>
>
>
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Anthony Damico

2012-Nov-13 15:20 UTC

head link

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Hi Andrew, to work with the Current Population Survey with R, your best
best is to use a variant of my SAScii package that works with a SQLite
database (and therefore doesn't overload RAM).

I have written obsessively-documented code about how to work with the CPS
in R here..

http://usgsd.blogspot.com/search/label/current%20population%20survey%20%28cps%29

..but example only loads one year of data at a time.  The function
read.SAScii.sqlite() used in that code can be run on a 51 year data set
just the same.

If you need to generate standard errors, confidence intervals, or
variances, I don't recommend using ffdf for complex sample surveys -- in my
experience it doesn't work well with R's survey package.

These scripts use the Census Bureau version of the CPS, but you can make
some slight changes and get it working on IPUMS files too..  Let me know if
you run into any trouble.  :)

Anthony



On Mon, Nov 12, 2012 at 11:23 PM, andrewH <ahoerner@rprogress.org> wrote:
> Dear folks –
> I have a large (26 gig) ASCII flat file in fixed-width format with about 10
> million observations of roughly 400 variables.  (It is 51 years of Current
> Population Survey micro data from IPUMS, roughly half the fields for each
> record).  The file was produced by automatic process in response to a data
> request of mine.
>
> The file is not accompanied by a human-readable file giving the fieldnames
> and starting positions for each field.  Instead it comes with three command
> files that describe the file, one each for SAS SPSS, and Stata. I do not
> have ready access to any of these programs.  I understand that these files
> also include the equivalent of the levels attribute for the coded data.  I
> might be able to hand-extract the information I need from the command
> files,
> but this would involve days of tedious work that I am hoping to avoid.
>
> I have read through the R Data Import/Export manual 2 and the foreign
> package documentation and I do not see anything that would allow me to
> extract the necessary information from these command files. Does anyone
> know
> of any r package or other non-proprietary tools that would allow me to get
> this data set from its current form into any of the following formats:
> SAS, SPSS or Stata binary files read by R.
> A MySQL data base
> An ffdf object readable using the ff package.
>
> My ultimate goal is to get the data into an ffdf object so that I can
> manipulate it in R, perhaps by way of a database. In allocation I will
> probably be using no more than 20 variables at a time, probably a bit under
> a gig. I am working on a machine with three gig of ram.
>
> (I have seen some suggestions that data.table also provides a
> memory-efficient way of providing database-like functions, but I am unsure
> whether it would let me cope with an object of this size).
>
> Any help or suggestions anyone could offer would be very much appreciated.
>
> Warmest regards, andrewH
>
>
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

David Winsemius

2012-Nov-13 19:41 UTC

head link

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

On Nov 13, 2012, at 7:20 AM, Anthony Damico wrote:
> Hi Andrew, to work with the Current Population Survey with R, your best
> best is to use a variant of my SAScii package that works with a SQLite
> database (and therefore doesn't overload RAM).
> 
> I have written obsessively-documented code about how to work with the CPS
> in R here..
> 
>
http://usgsd.blogspot.com/search/label/current%20population%20survey%20%28cps%29
> 
> ..but example only loads one year of data at a time.  The function
> read.SAScii.sqlite() used in that code can be run on a 51 year data set
> just the same.
> 
> If you need to generate standard errors, confidence intervals, or
> variances, I don't recommend using ffdf for complex sample surveys --
in my
> experience it doesn't work well with R's survey package.
> 
> These scripts use the Census Bureau version of the CPS, but you can make
> some slight changes and get it working on IPUMS files too..  Let me know if
> you run into any trouble.  :)
I'd like to take this opportunity to thank Anthony for his work on this
dataset as well as on several others. The ones I am most interested in are the
NHANES-III and Continuous NHANES datasets and he has the 2009-2010 set from the
Continuous NHANES series represented in his examples. Scraping the list of
datasets from his website:

available data

	? area resource file (arf) (1)
	? consumer expenditure survey (ce) (1)
	? current population survey (cps) (1)
	? general social survey (gss) (1)
	? national health and nutrition examination survey (nhanes) (1)
	? national health interview survey (nhis) (1)
	? national study of drug use and health (nsduh) (1)

And thanks to you for this question, andrewH; 

... it prompted a response from Jan to a package by Jan van der Laan which had
subsequent links (via a reverseDepends citation) to a SEERabomb package by Tomas
Radivoyevitch that provides examples of handling the SEER datasets, at least the
Hematologic tumors dataset. My experience with SEER data in the past has been
entirely mediated through SEER*Stat which is (somewhat) user-friendly Windows
package for working with the SEER fixed field formats, but it should be exciting
to see another accessible avenue through R.

Thanks, Anthony, Jan, and andrewH, and further thanks to Thomas Lumley on whose
work I believe Anthony's package Depends because of the need for proper
handling of the sampling weights.

-- 
David Winsemius> 
> Anthony
> 
> 
> 
> On Mon, Nov 12, 2012 at 11:23 PM, andrewH <ahoerner at rprogress.org>
wrote:
> 
>> Dear folks ?
>> I have a large (26 gig) ASCII flat file in fixed-width format with
about 10
>> million observations of roughly 400 variables.  (It is 51 years of
Current
>> Population Survey micro data from IPUMS, roughly half the fields for
each
>> record).  The file was produced by automatic process in response to a
data
>> request of mine.
>> 
>> The file is not accompanied by a human-readable file giving the
fieldnames
>> and starting positions for each field.  Instead it comes with three
command
>> files that describe the file, one each for SAS SPSS, and Stata. I do
not
>> have ready access to any of these programs.  I understand that these
files
>> also include the equivalent of the levels attribute for the coded data.
I
>> might be able to hand-extract the information I need from the command
>> files,
>> but this would involve days of tedious work that I am hoping to avoid.
>> 
>> I have read through the R Data Import/Export manual 2 and the foreign
>> package documentation and I do not see anything that would allow me to
>> extract the necessary information from these command files. Does anyone
>> know
>> of any r package or other non-proprietary tools that would allow me to
get
>> this data set from its current form into any of the following formats:
>> SAS, SPSS or Stata binary files read by R.
>> A MySQL data base
>> An ffdf object readable using the ff package.
>> 
>> My ultimate goal is to get the data into an ffdf object so that I can
>> manipulate it in R, perhaps by way of a database. In allocation I will
>> probably be using no more than 20 variables at a time, probably a bit
under
>> a gig. I am working on a machine with three gig of ram.
>> 
>> (I have seen some suggestions that data.table also provides a
>> memory-efficient way of providing database-like functions, but I am
unsure
>> whether it would let me cope with an object of this size).
>> 
>> Any help or suggestions anyone could offer would be very much
appreciated.
>> 
>> Warmest regards, andrewH
>> 
> 

David Winsemius, MD
Alameda, CA, USA

andrewH

2012-Nov-14 06:33 UTC

head link

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Wow! After reading Jan's post, I said "Great, I'll do that,"
because it was
the closest to what I originally had in mind. Then I read Ista's post, and
said "I think I'l try that first," because it got me back on the
track of
following directions in the R Data Import/Export manual. Then I read
Anthony's post. Now, I am not so thrilled to go the database route, because
frankly have hardly ever used them before, and this would make an already
complex project take longer.

But, I know that I will need to use the sample survey package for what I am
trying to do. So i think I am going to try to get the data into SQLite
format, and just hope the effort builds character. Anthony, I have not used
your packages yet, but they look great!

It will probably be more than a week before i get all this worked out and
implemented. Given how much work this will be, I do not want to do it twice,
so I think I will go back to IPUMS and get the rest of the variables, and
break the file up into smaller chunks at the same time, both so I really
have the whole thing, and also so that it is easier to work with. The
IPUMS version of the file is rectangular (it duplicates the household data
in each individual), and IPUMS has done a lot of valuable work in cleaning
the data and harmonizing variable names and definitions that have changed
over the history of the CPS. (Annoyingly, however, they have not connected
the cross-sections between years. All the CPS samples consist of two sets of
four consecutive months, eight months apart, so the March Supplement always
consist half of people who were interviewed in the last year and half of
people who will be interviewed in the next year (barring turnover)).

Anyway, when I have figured out my route to import I will report back here.
In the meantime, I have three more questions that one of you may be able to
answer:
1. Anthony, does the read.SAScii.sqlite function preserve the label names
for factors in a data frame it imports into SQLite, when those labels are
coded in the command file?
2. If I want to make the resulting SQLite database available to the R
community, is there a good place for me to put it? Assume it is 10-20 gigs
in size. Ideally, it would be set up so that it could be queried remotely
and extracts downloaded. Setting this up is beyond my competence today, but
maybe not in a couple of months. (I'd like to do the same thing with the 30
years of Consumer Expenditure Survey data I have. I don't have access to SAS
any more, but I converted it all to flat flies while I still did. Currently
the BLS only makes 2011 microdata available free. Earlier years on cd are
$200/year. But they have told me that they have no objection to my making
them available).
3. I have not yet been able to determine whether CPS micro data from the
period 1940-1961 exists. Does anyone know? It is not on
http://thedataweb.rm.census.gov/ftp/cps_ftp.html, and IPUMS and NBER
(http://www.nber.org/data/current-population-survey-data.html) both only
give data back to 1962. I wrote to Census a week ago, but I have not heard
back from them, and in the past they have not been very helpful about
historical micro data.

Thanks to all! Andrew

--
View this message in context:
http://r.789695.n4.nabble.com/Getting-information-encoded-in-a-SAS-SPSS-or-Stata-command-file-into-R-tp4649353p4649466.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Nov 2012 - Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

[R] Getting information encoded in a SAS, SPSS or Stata command file into R.

Seemingly Similar Threads