peter.muhlberger at gmail.com
2009-Jul-11 20:05 UTC
[Rd] foreign generates bad Stata data files (PR#13820)
Full_Name: peter muhlberger Version: 2.7.1 OS: Ubuntu x86_64 dual core Submission from: (NULL) (70.238.206.13) I've spent half a day generating .dta files using write.dta only to have them crash my copy of Stata. I eventually discovered that removing a string variable with a maximum observed length of 280 characters allows Stata to read the file without problems. A Stata limit is that the length of a string variable cannot exceed 244. R gives no warning about this problem. I assume it does not abbreviate either. The following code creates a .dta file that causes my copy of Stata to suddenly disappear when I try to open the file: x=c("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX") x=as.data.frame(x, stringsAsFactors=F) names(x)[1]="x" write.dta(x, version = 10, file="/home/peterm/Desktop/x.dta") That's the main problem I wanted to report. There are several others you might want to look into. Above, leave out names(x)[1]="x" and take a look at what the name of the variable is in the dataframe--it's a line of code. Also, write.dta is supposed to turn factors into variable labels in Stata, but I get no variable labels in Stata (starting w/ data that has factors). Finally, when I try to read an spss dataset created by Sawtooth into R using read.spss, I get a multitude of variables that aren't in the original dataset and have nothing in them. Below, my sessionInfo:> sessionInfo()R version 2.7.1 (2008-06-23) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-36 JGR_1.6-7 iplots_1.1-3 JavaGD_0.5-2 rJava_0.6-3 boot_1.2-37 Hope this helps! Cheers, Peter