Hi, I have a SPSS datafile which is used for my textbook in the statistics (and which is available on http://abacon.com/fox/s6720p2.sav, but it is originally from ICPSR). When I opened it with SPSS 10 and run Frequencies on it I have got 979 valid data a 27 missing. However, see below (unfortunately, I have used R in preparation of my homework, which caused me an error on this): > data=read.spss("s6720p2.sav") > levels(data$CP1) [1] "Rf" "Dk" "Neither" "Oppose" "Favor" > length(data$CP1[data$CP1=="Favor"]) [1] 727 > length(data$CP1[data$CP1=="Oppose"]) [1] 177 > length(data$CP1[data$CP1=="Neither"]) [1] 79 > length(data$CP1[data$CP1=="Dk"]) [1] 19 > length(data$CP1[data$CP1=="Rf"]) [1] 3 > data$CP1[data$CP1=="Rf" | data$CP1=="Dk"]<-NA > length(data$CP1[!is.na(data$CP1)]) [1] 983 > length(data$CP1[is.na(data$CP1)]) [1] 22 > 727+177+79 [1] 983 Now, what is even more strange is, that when I have exported just the variable CP1 from the full file (in SPSS) and run on it the same frequencies as in the full size version, the results were same as in R (yes, I have checked that the definition of the missing values was the same: 8,9 -- labelled as Rf and Dk). I have uploaded the data and all reports (in PDF) on http://www.volny.cz/cepls/ps-pdf/s6720p2.zip. Could anybody help me to understand what I did wrong, please? Thanks Matej -- Matej Cepl, matej at ceplovi.cz, PGP ID# D96484AC 138 Highland Ave. #10, Somerville, Ma 02143, (617) 623-1488 In those days spirits were brave, the stakes were high, men were real men, women were real women and small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri. -- Douglas Adams -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Matej Cepl <matej at ceplovi.cz> writes:> Hi, > > I have a SPSS datafile which is used for my textbook in the > statistics (and which is available on > http://abacon.com/fox/s6720p2.sav, but it is originally from > ICPSR). > > When I opened it with SPSS 10 and run Frequencies on it I > have got 979 valid data a 27 missing. However, see below > (unfortunately, I have used R in preparation of my homework, > which caused me an error on this): > > > data=read.spss("s6720p2.sav") > > levels(data$CP1) > [1] "Rf" "Dk" "Neither" "Oppose" "Favor" > > length(data$CP1[data$CP1=="Favor"]) > [1] 727 > > length(data$CP1[data$CP1=="Oppose"]) > [1] 177 > > length(data$CP1[data$CP1=="Neither"]) > [1] 79 > > length(data$CP1[data$CP1=="Dk"]) > [1] 19 > > length(data$CP1[data$CP1=="Rf"]) > [1] 3 > > data$CP1[data$CP1=="Rf" | data$CP1=="Dk"]<-NA > > length(data$CP1[!is.na(data$CP1)]) > [1] 983 > > length(data$CP1[is.na(data$CP1)]) > [1] 22 > > 727+177+79 > [1] 983 > > Now, what is even more strange is, that when I have exported just > the variable CP1 from the full file (in SPSS) and run on it the > same frequencies as in the full size version, the results were > same as in R (yes, I have checked that the definition of the > missing values was the same: 8,9 -- labelled as Rf and Dk). > > I have uploaded the data and all reports (in PDF) on > http://www.volny.cz/cepls/ps-pdf/s6720p2.zip. > > Could anybody help me to understand what I did wrong, please?The length(data$CP1[data$CP1=="Rf"]) construction is unsound (what happens if there are NA in the indexing variable?) and you'd be better off with sum(data$CP1 %in% "Rf") or simply table(data$CP1), but that seems unrelated here. As you say, your cp1.pdf is perfectly in accordance with the R output, whereas cp1-whole_data.pdf differs. It also includes the rather extraordinary claim that 979+27=1005 !! Is there any chance you may have accidentally modified it? [If your instructor still insists that SPSS must be right, and this really is what it gives as output, I'd point out the obvious discrepancies with itself and with the data set with just the CP1 variable in it, leaving R out of the discussion...] What is ICPSR, btw? -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 27 Oct 2002, Peter Dalgaard BSA wrote:> > As you say, your cp1.pdf is perfectly in accordance with the R output, > whereas cp1-whole_data.pdf differs. It also includes the rather > extraordinary claim that 979+27=1005 !! Is there any chance you may > have accidentally modified it? >I can verify the SPSS 10 results as well (cut and pasted directly) FAVOR: DEATH PENALTY FOR MURDERERS Frequency Percent Valid Percent Cumulative Percent Valid Favor 708 70.4 72.3 72.3 Oppose 189 18.8 19.3 91.6 Neither 82 8.2 8.4 100.0 Total 979 97.4 100.0 Missing Dk 24 2.4 Rf 3 .3 Total 27 2.6 Total 1005 100.0 Distinctly weird. -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._