Dear all,
Some very wise data entry person gave me about an hour of a headache, trying
to find out why a 2000x500 dataframe won't be read into R.
After much trial and error, I pinpointed the problem to an accidentally
inserted double quote into a string variable (some comments from an open
question). This can be replicated by:
aa <- data.frame(id=1:2, var1=c("some \" quote", "without
quote"))> aa
id var1
1 1 some " quote
2 2 without quote
Saving this with R:
write.table(aa, "aa.dat", sep="\t", row.names=F)
creates the following ASCII file (between #s)
### R export
"id" "var1"
1 "some \" quote"
2 "without quote"
###
which throws an error when trying to load it back:
> bb <- read.table("aa.dat", sep="\t", header=T)
Warning message:
In read.table("aa.dat", sep = "\t", header = T) :
incomplete final line found by readTableHeader on 'aa.dat'
The dataframe was initially an SPSS file, which saved it as tab delimited in
this format:
### SPSS export
"id" "var1"
1 "some " quote"
2 "without quote"
###
which of course thrown the same obvious error.
StatTransfer was the only software that solved the problem of exporting the
SPSS file in a tab delimited file that could finally be imported in R, and
the saved file looks like this:
### StatTransfer export
"id" "var1"
1 "some "" quote"
2 "without quote"
###
Given these examples, I have two questions:
1. What is the correct syntax to import the R-exported file
2. What can I do to prevent these situations from happening?
(besides whipping the data entry person :), I am referring to R procedures to
detect and correct such things)
Thank you,
Adrian
--
Adrian Dusa
Romanian Social Data Archive
1, Schitu Magureanu Bd
050025 Bucharest sector 5
Romania
Tel./Fax: +40 21 3126618 \
+40 21 3120210 / int.101