I am having a problem with the foreign library correctly reading some integer data. Specifically, d _ read.dta('aptaa.dta')> d[1:5,]scenario metcode yr ginv cons gocc abs dvac gmre gmer 1 1 AA 2002 0.007 1377 -0.071 51710 0.071 -0.011 -0.127 2 1 AA 2003 0.000 0 -0.016 62568 0.014 -0.043 -0.538 3 1 AA 2004 0.000 0 -0.002 65122 0.002 -0.090 -0.338 4 1 AA 2005 0.000 0 0.000 65528 0.000 -0.036 -0.272 5 1 AA 2006 0.000 0 0.002 309 -0.001 -0.050 0.468> dd _ read.csv('aptaa.csv',header=T) > dd[1:5,]scenario metcode yr ginv cons gocc abs dvac gmre gmer 1 1 AA 2002 0.007 1377 -0.071 -13826 0.071 -0.011 -0.127 2 1 AA 2003 0.000 0 -0.016 -2968 0.014 -0.043 -0.538 3 1 AA 2004 0.000 0 -0.002 -414 0.002 -0.090 -0.338 4 1 AA 2005 0.000 0 0.000 -8 0.000 -0.036 -0.272 5 1 AA 2006 0.000 0 0.002 309 -0.001 -0.050 0.468 In theory, dd== d, but notice the differences in abs. The problem is that aptaa.csv is simply an "outsheet using" (in stata) version of aptaa.dta - and thus identical. I have checked these two files (numerous times.). (loading/reloading/rewriting in stata) It appears that the negative "abs" is a problem, but other variables are not suffering the same problem. (no other integer variables take negatives) (this occurs on two different machines - R==1.4 and R==1.5 ) Also, I have 300 files of similar format, the problem occurs consistently -- seemingly always on negative integers.> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 5.0 year 2002 month 04 day 29 language R foreign library == foreign_0.5-4.tar.gz Has anyone seen this before - have a good idea for fixing it? Michaell -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 16 May 2002, Michaell Taylor wrote:> > I am having a problem with the foreign library correctly reading some integer > data. Specifically, > > d _ read.dta('aptaa.dta') > > d[1:5,] > scenario metcode yr ginv cons gocc abs dvac gmre gmer > 1 1 AA 2002 0.007 1377 -0.071 51710 0.071 -0.011 -0.127 > 2 1 AA 2003 0.000 0 -0.016 62568 0.014 -0.043 -0.538 > 3 1 AA 2004 0.000 0 -0.002 65122 0.002 -0.090 -0.338 > 4 1 AA 2005 0.000 0 0.000 65528 0.000 -0.036 -0.272 > 5 1 AA 2006 0.000 0 0.002 309 -0.001 -0.050 0.468 > > dd _ read.csv('aptaa.csv',header=T) > > dd[1:5,] > scenario metcode yr ginv cons gocc abs dvac gmre gmer > 1 1 AA 2002 0.007 1377 -0.071 -13826 0.071 -0.011 -0.127 > 2 1 AA 2003 0.000 0 -0.016 -2968 0.014 -0.043 -0.538 > 3 1 AA 2004 0.000 0 -0.002 -414 0.002 -0.090 -0.338 > 4 1 AA 2005 0.000 0 0.000 -8 0.000 -0.036 -0.272 > 5 1 AA 2006 0.000 0 0.002 309 -0.001 -0.050 0.468 > > In theory, dd== d, but notice the differences in abs. > > The problem is that aptaa.csv is simply an "outsheet using" (in stata) version > of aptaa.dta - and thus identical. I have checked these two files (numerous > times.). (loading/reloading/rewriting in stata) > > It appears that the negative "abs" is a problem, but other variables are not > suffering the same problem. (no other integer variables take negatives) >This looks like a signed vs unsigned integer problem. If so, a simple work-around is to subtract 65536 from values greater than 32767. abs <- ifelse(abs>2^15, abs-2^16, abs) This also suggests that the Stata command recast long abs might well be a more elegant work-around. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._