Hi folks,
I've got this funny problem with R's foreign library when reading stata
files. One file consistently produces vector out of memory errors after
gobbling up 2.7G of memory. I parsed through the read.dta function and
figured out where the error occurs and the description is below. I am
running R-1.8.1 on Debian stable system glibc2.2 kernel 2.4.24. R is is
compiled from source as a shared library. The file that I am reading is
only 172M in size. The system I am using has 4G of free memory and 8 G of
swap so this doesn't seem to be a problem for lack of free memory. See
Below.
Thanks.
-----------------------------------------------------------------------
I stepped through the
function and found that everything runs fine but I get a bunch of warnings
duing the convert.factors section of the code like:
> warnings()
Warning messages:
1: Value labels (fafdstmp) for afdstmp are missing
2: Value labels (fafsmon) for afsmon are missing
3: Value labels (fafsnum) for afsnum are missing
4: Value labels (fafsval) for afsval are missing
5: Value labels (fahcmcar) for ahcmcare are missing
6: Value labels (fahengyv) for ahengyv are missing
7: Value labels (fahenrgy) for ahenrgy are missing
8: Value labels (fahflnch) for ahflnch are missing
9: Value labels (fahflnno) for ahflnno are missing
10: Value labels (fahhcvhi) for ahhcvhi are missing
11: Value labels (fahhhino) for ahhhino are missing
12: Value labels (fahhnum) for ahhnum are missing
13: Value labels (fahmcnum) for ahmcnum are missing
14: Value labels (fahncvhi) for ahncvhi are missing
etc.
then when I try and return rval as the last line in the function and this
is where R starts gobbling up a tone of memory and eventualy dies with a
vector memory exhausted error.
Do you have a sense of where this could be coming from? Must be something
funny about the communication between the foreign library and the main R
lib.
I'll email the R folks.
On Wed, 4 Feb 2004, Mark S. Handcock wrote:
> Date: Wed, 4 Feb 2004 14:38:12 -0800
> From: Mark S. Handcock <handcock at stat.washington.edu>
> To: 'Cere M. Davis' <cere at u.washington.edu>,
> 'R. Anderson' <anders10 at u.washington.edu>
> Cc: morrism at u.washington.edu, 'Matthew B Weatherford' <mbw at
u.washington.edu>,
> Msh <handcock at stat.washington.edu>
> Subject: RE: error
>
> Cere,
>
> This is useful information. How large is the original data file? If it is
> small (<1Gb) then the 2.7Gb is excessive. Have you searched the R users
> group on www.r-project.org?
>
> Also, can you try:
>
> rval <- .External("do_readStata", "file", PACKAGE =
"foreign")
>
> where "file" is the stata file name on both machines. This is the
internal R
> read using C, so if that works it is elsewhere in the "read.dta"
function
> which is easy to fix.
>
> Mark
>
> > -----Original Message-----
> > From: Cere M. Davis [mailto:cere at u.washington.edu]
> > Sent: Monday, February 02, 2004 10:45 PM
> > To: R. Anderson
> > Cc: morrism at u.washington.edu; handcock at stat.washington.edu;
> > Matthew B Weatherford
> > Subject: Re: error
> >
> >
> > More info on the R memory problem. Just reading one dta file
> > in via the
> > foreign library requires upwards of 2.7G of memory on any
> > machine, 2.7G is
> > the point at which the process runs out of memory so I can't know
the
> > upper limit of this process. I am running the R read process
> > on Libra now
> > but it's been 5 hours since I started the read request and
> > the disk swap
> > is so busy that I cannot tell when the process will finish.
> > There does
> > appear to be a problem with this R job using system swap
> > space on Mosix so
> > a quick test and fix for this is coopt another machine and
> > aggregate some
> > RAM from another machine - if there is physical space in the machine -
> > sometime tommorow hopefully.
> >
> > Stay tuned.
> >
> > >
> > >
> > > Thanks Robin for this email. I am able to reproduce what
> > you reported
> > > using the file that you gave me below so thank you very
> > much for that.
> > > From what I can see this appears to me a memory allocation
> > issue that
> > > affects all systems but because the main node has such fast
ethernet
> > > speeds on can see the results of the problem quckly. I am
> > testing this
> > > problem on a system with more memory and may have a better
> > sense of what
> > > is needed once I see the results.
> > >
> > > I'll let you know as I learn more perhaps later today.
> > >
> > > Thanks,
> > > Cere
> > >
> > > On Wed, 28 Jan 2004, R. Anderson wrote:
> > >
> > > > Date: Wed, 28 Jan 2004 22:25:11 -0800 (PST)
> > > > From: R. Anderson <anders10 at u.washington.edu>
> > > > To: Cere M. Davis <cere at u.washington.edu>
> > > > Cc: morrism at u.washington.edu
> > > > Subject: Re: error
> > > >
> > > > Cere-
> > > > In the March files(which use the same .dta as the match
> > files-- we were
> > > > looking at on friday), I was able to get 1979-1988 and
> > 1996-2001 to
> > > > run with marchdatameta.R and create Rdata files.
> > > >
> > > > However when the meta file ran, for example, 1989, the
> > vector error
> > > > occured again.
> > > >
> > > > So I tried running some of the files (marchdatacopy1989.R,
> > > > marchdatacopy1990.R,...) individually. I was able to
> > produce an RData set
> > > > from the 1989 file.
> > > >
> > > > However when I ran the 1990.R file, I got the
> > > > follwing error:
> > > >
> > ______________________________________________________________________
> > > >
> > > >
> > > > > ##################################################
> > > > > # marchdatacopy1990.R #
> > > > > # 10 Jan 2004 -ra #
> > > > > # #
> > > > > # This is a template file that is used to read #
> > > > > # SPSS data into R and should prepare the basic #
> > > > > # variables needed for the analysis of income #
> > > > > # for any year 1990 that is specified. It is #
> > > > > # sourced by the shell script "marchmetacode"
#
> > > > > # for years that are specified in #
> > > > > # "marchdatameta.R".
#
> > > > > # -RA, 10 Jan 2004 #
> > > > > ##################################################
> > > > >
> > > > > library(foreign)
> > > > > options(object.size = 10000000)
> > > > > mar1990 <-
> > > >
> >
read.dta("/net/home/morrism/Data/CPS/March/Extracts.all/mar1990.dta")
> > > > Error: vector memory exhausted (limit reached?)
> > > >
> > > > Process R segmentation fault at Wed Jan 28 21:14:41 2004
> > > >
> > ______________________________________________________________
> > _________
> > > > This was ran in mos2, interactively in emacs and the
> > error differs from
> > > > the other vecor errors.
> > > >
> > > > And then I ran the marchdatacopy1990.R in klee and got
> > the following
> > > > warning:
> > > >
> > ______________________________________________________________
> > _______________
> > > > run marchdatacopy1990.R
> > > > /usr/local/R-1.8.1/lib/R/bin/BATCH: line 55: 31545 Done
> > > > ( echo "invisible(options(echo = TRUE))"; cat
${in}; echo
> > "proc.time()" )
> > > > 31546 Killed | ${R_HOME}/bin/R
> > ${opts} >${out} 2>&1
> > > >
> > ______________________________________________________________
> > _____________
> > > >
> > > > When I openned the outfile, marchdatacopy1990.Rout, There
> > was nothing but
> > > > the R prompt.(This is outfile after running the file in
klee)
> > > >
> > > > I can stop by Friday morning or Thursday
> > > > afternoon(I meet with Prof Morris at 3 and can stop by
> > afterwards).
> > > >
> > > > I think it is very odd that the marchdatameta file ran
> > without error some
> > > > of the years and others it produced an error. Aslo note
> > that running
> > > > the matchdatameta file continued to produce same errors
> > as before for all
> > > > years.
> > > >
> > > >
> > > > The directories for the match and march are:
> > > >
> > > > /net/home/morrism/Data/CPS/Comp/R/Code/MarchData ---For
march
> > > > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData ---For
match
> > > >
> > > > In each directory I am creating datasets from the same
> > .dta files, which
> > > > are in:
> > > >
> > > > /net/home/morrism/Data/CPS/March/Extracts.all
> > > >
> > > > So I do not understand why the marchdatameta file will
> > work for some years
> > > > and the matchdatameta produces the vector error for all
years.
> > > >
> > > >
> > > > Thanks,
> > > > Robin Anderson
> > > >
> > > >
> > > >
> > > > On Fri, 23 Jan 2004, Cere M. Davis wrote:
> > > >
> > > > >
> > > > > If you are going to be around today please come by and
> > we'll work on this
> > > > > some more if you have time.
> > > > >
> > > > > >
> > > > > >
> > > > > > Cere-
> > > > > > By running ..1987.R through the matchdatmeta.R I
do
> > get the "vector"
> > > > > > error.
> > > > > > I am running that file interactivly through
emacs/R
> > split window.
> > > > > > Here is the file path for the .Rout file:
> > > > > >
> > > > > >
> > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatacopy
> > 1987.Rout
> > > > > >
> > > > > > This is the file path for the file that creates an
R
> > for each year, runs
> > > > > > the R file, by R BATCH --no-save, to get the .Rout
file.:
> > > > > >
> > > > > >
> > /net/home/morrism/Data/CPS/Comp/R/Code/MatchData/matchdatameta.R
> > > > > >
> > > > > > Thanks Again
> > > > > > Robin
> > > > > >
> > > > >
> > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - - - - -
> > > > > Cere Davis
> > > > > Unix Systems Administrator - CSDE
> > > > > cere at u.washington.edu ph: 206.685.5346
> > > > > https://staff.washington.edu/cere
> > > > >
> > > > > GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
> > > > > Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48
> > A832 F455 9E7FA
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - - -
> > > Cere Davis
> > > Unix Systems Administrator - CSDE
> > > cere at u.washington.edu ph: 206.685.5346
> > > https://staff.washington.edu/cere
> > >
> > > GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
> > > Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832
> > F455 9E7FA
> > >
> > >
> > >
> > >
> > >
> > >
> >
> > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> > - - - - - -
> > Cere Davis
> > Unix Systems Administrator - CSDE
> > cere at u.washington.edu ph: 206.685.5346
> > https://staff.washington.edu/cere
> >
> > GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
> > Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832 F455 9E7FA
> >
> >
> >
>
>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Cere Davis
Unix Systems Administrator - CSDE
cere at u.washington.edu ph: 206.685.5346
https://staff.washington.edu/cere
GnuPG Key http://staff.washington.edu/cere/gpgkey.txt
Key fingerprint = B63C 2361 3B9B 8599 ECC9 D061 3E48 A832 F455 9E7FA