My students are working with several SPSS dataset provided by the European Social Survey. If you register your name, you can download it too. This is the 2004 data, for example: http://ess.nsd.uib.no/ess/round2/ I cannot give you the European Survey dataset, but you can download it for free if you like, and then you could run these commands to re-produce this weird pattern described below. library(foreign) d2 <- read.spss("ESS3e03_2.por") warnings() str(d2$HAPPY) d2 <- as.data.frame(d2) str(d2$HAPPY) d2 <- read.spss("ESS3e03_2.por",to.data.frame=T) warnings() str(d2$HAPPY) Here's my info for this example:> sessionInfo()R version 2.10.0 (2009-10-26) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] foreign_0.8-38 The weirdness that follows is the difference between d2 <- read.spss( ... , to.data.frame=T) and d2 <- read.spss () d2 <- as.data.frame(d2) The former causes all data to become <NA> but the latter seems mostly OK.> library(foreign) > d2 <- read.spss("ESS3e03_2.por")warnings() There were 12 warnings (use warnings() to see them)> Warning messages:1: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : duplicated levels will not be allowed in factors anymore 2: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : duplicated levels will not be allowed in factors anymore 3: In `levels<-`(`*tmp*`, value = c("Refusal", "Don't know", ... : duplicated levels will not be allowed in factors anymore 4: In `levels<-`(`*tmp*`, value = c("No second language mentioned", ... : duplicated levels will not be allowed in factors anymore 5: In `levels<-`(`*tmp*`, value = c("Sans dipl", "Non dipl", ... : duplicated levels will not be allowed in factors anymore 6: In `levels<-`(`*tmp*`, value = c("\"Ej avslutad folkskola/grundskola\"", ... : duplicated levels will not be allowed in factors anymore 7: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, senior officials and managers", ... : duplicated levels will not be allowed in factors anymore 8: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, senior officials and managers", ... : duplicated levels will not be allowed in factors anymore 9: In `levels<-`(`*tmp*`, value = c("K", "K", "Frederiksborg Amt", ... : duplicated levels will not be allowed in factors anymore 10: In `levels<-`(`*tmp*`, value = c("P", "L", "Kesk-Eesti", ... : duplicated levels will not be allowed in factors anymore 11: In `levels<-`(`*tmp*`, value = c("Galicia", "Principado de Asturias", ... : duplicated levels will not be allowed in factors anymore 12: In `levels<-`(`*tmp*`, value = c("Stockholm", "", "Sydsverige", ... : duplicated levels will not be allowed in factors anymore> str(d2$HAPPY)Factor w/ 14 levels "Extremely unhappy",..: 9 7 9 11 9 6 9 4 13 8 ...> d2 <- as.data.frame(d2) > str(d2$HAPPY)Factor w/ 14 levels "Extremely unhappy",..: 9 7 9 11 9 6 9 4 13 8 ... That appears valid. On my first effort, I had tried to get the data frame in a single shot with read.spss> d2 <- read.spss("ESS3e03_2.por",to.data.frame=T)There were 15 warnings (use warnings() to see them)> warnings()Warning messages: 1: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 2: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 3: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : longer object length is not a multiple of shorter object length 4: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : duplicated levels will not be allowed in factors anymore 5: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : duplicated levels will not be allowed in factors anymore 6: In `levels<-`(`*tmp*`, value = c("Refusal", "Don't know", ... : duplicated levels will not be allowed in factors anymore 7: In `levels<-`(`*tmp*`, value = c("No second language mentioned", ... : duplicated levels will not be allowed in factors anymore 8: In `levels<-`(`*tmp*`, value = c("Sans dipl", "Non dipl", ... : duplicated levels will not be allowed in factors anymore 9: In `levels<-`(`*tmp*`, value = c("\"Ej avslutad folkskola/grundskola\"", ... : duplicated levels will not be allowed in factors anymore 10: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, senior officials and managers", ... : duplicated levels will not be allowed in factors anymore 11: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, senior officials and managers", ... : duplicated levels will not be allowed in factors anymore 12: In `levels<-`(`*tmp*`, value = c("K", "K", "Frederiksborg Amt", ... : duplicated levels will not be allowed in factors anymore 13: In `levels<-`(`*tmp*`, value = c("P", "L", "Kesk-Eesti", ... : duplicated levels will not be allowed in factors anymore 14: In `levels<-`(`*tmp*`, value = c("Galicia", "Principado de Asturias", ... : duplicated levels will not be allowed in factors anymore 15: In `levels<-`(`*tmp*`, value = c("Stockholm", "", "Sydsverige", ... : duplicated levels will not be allowed in factors anymore > str(d2$HAPPY) Factor w/ 13 levels "Extremely unhappy",..: NA NA NA NA NA NA NA NA NA NA ... Oh, heck, all the values are missing!! Somehow, putting "to.data.frame" inside the read.spss causes a different outcome than using as.data.frame after reading in the data. The symptoms of this in R-2.9 are a little different, but the conclusion the same. Help? In case you are a student who wants to work with this data, I can share to you the large script that I have been accumulating so that you might "play along". It turns out to be surprisingly difficult to "recode" these factor variables that have levels like "none", "1", "2",..."9", "total". ## Paul Johnson ## November 13, 2009 ## A question arose in the lab. A student asks "I want ## to compare the answers from two different editions ## of the European Social Survey. ## I will add this to Stuff Worth Knowing later, but ## I can share this tutorial to you right now. ## From this website: ## http://ess.nsd.uib.no/ess ## Download those European Social Survey Datasets into a directory. ## In a terminal, use the unzip command: ## unzip ESS3e03_2.spss.zip ## unzip ESS2e03_1.spss.zip ## Then run the following in R. library(foreign) d2 <- read.spss("ESS3e03_2.por",to.data.frame=T) d2 <- read.spss("ESS3e03_2.por") warnings() ### You can try to go into a data frame in one ### step, that's an option in read.spss. But ### we saw warnings, and wanted to be careful. d2 <- as.data.frame(d2) d2$whichSurvey <- 2 d3 <- read.spss("ESS2e03_1.por") d3 <- as.data.frame(d3) d3$whichSurvey <- 3 namesd2 <- names(d2) namesd3 <- names(d3) commonNames <- intersect( namesd3, namesd2) combod23 <- rbind(d2[ , commonNames], d3[, commonNames]) save(combod23, file="combod23.Rda") ## Error ##Warning messages: ##1: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : ## invalid factor level, NAs generated ##2: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : ## invalid factor level, NAs generated ##3: In `[<-.factor`(`*tmp*`, ri, value = c(1, 1, 1, 1, 1, 1, 1, 1, 1, : ## invalid factor level, NAs generated ## That worries me a little bit. The warnings did too. ## Inspect a few lines in the result. combod23[1:4, ] ## fix doesn't work for me, did not bother to investigate. ##> fix(combod23) ##Error in edit.data.frame(get(subx, envir = parent), title = subx, ...) : ## can only handle vector and factor elements ## That means some data from hell came into this thing. ## I suspect that combod23 is OK. ## The memory use on this exercise is huge! Try to help it rm (d2) rm (d3) ## But I worry. I have 2 ways that I use to try to figure this ## out. One is to open the dataset in a clone of SPSS called ## "PSPP". Actually, the executable is "psppire". ## ## The other thing I do is open the same data again in ## a numeric format, and compare the 2 combined data frames ## This is also a useful exercise because it helps you ## understand what a "factor" is in R. dn2 <- read.spss("ESS3e03_2.por", use.value.labels = F) dn2 <- as.data.frame(dn2) dn2$whichSurvey <- 2 dn3 <- read.spss("ESS2e03_1.por", use.value.labels = F) dn3 <- as.data.frame(dn3) dn3$whichSurvey <- 3 ## Might be smart to compare # dn2$HAPPY[1:50] # d2$HAPPY[1:50] namesdn2 <- names(dn2) namesdn3 <- names(dn3) commonNNames <- intersect( namesdn3, namesdn2 ) combodn23 <- rbind(dn2[ , commonNNames], dn3[, commonNNames]) save(combodn23, file="combodn23.Rda") table( combod23$HAPPY, combodn23$HAPPY) ## In summary, whenever I want to use a variable from ## the combined data frame, I would probably compare ## against combodn23 just to feel safe. ## Note, after when you come back to work on this project again, you ## might as well just reload the saved copies of combod23 and ## combodn23. ## load("combod23.Rda") ## load("combodn23.Rda") ## That will put you at the current spot, no need to redo the merge ## Now, about "recoding". If you just want numerical ## data, you might consider using combodn23. ## But if you want some factors and some numberical ## variables, then you might need to recode to reclaim ## values. ## HAPPY turns out to be an interesting example of a ## PAIN IN THE ASS because in SPSS, it is scored from ## 0 to 10, but they give value labels only for scores ## 1= Extremely unhappy ## and ## 10= Extremely happy ## ## And the SPSS column has no labels for values 1-9. ## If SPSS gave NO labels at all, then this would come ## into R as a numeric variable. BUT, because there are ## 2 levels named, then R makes a factor out of it. ## When R turns it into a factor, you ## end up with a nutty looking factor, which has ## levels you don't really appreciate. levels(combod23$HAPPY) # [1] "Extremely unhappy" "1" "2" # [4] "3" "4" "5" # [7] "6" "7" "8" #[10] "9" "Extremely happy" "Refusal" #[13] "Don't know" "No answer" ## Create a new variable to play with combod23$HAPPY2 <- combod23$HAPPY ## Change Extremely Unhappy to text "0" levels(combod23$HAPPY)[1] <- "0" ## Change Extremely Happy to "10" levels(combod23$HAPPY)[11] <- "10" HELL <- levels(combod23$HAPPY) ### Look at HELL HELL combod23$HAPPY2[combod23$HAPPY %in% HELL[12:14] ] <- NA ##CHECK RESULT table(combod23$HAPPY, combod23$HAPPY2) ## Eliminate the unused levels from HAPPY2 combod23$HAPPY2 <- factor(combod23$HAPPY2) ### Same is found with ## combo23$HAPPY2 <- combo23$HAPPY2[ , drop=T] ## Use the "factor trick" to ## reset the variable back to numeric: combod23$HAPPYN <- as.numeric(HELL)[combod23$HAPPYN] ##CHECK RESULT table(combod23$HAPPY, combod23$HAPPY2) ## CHECK by comparing against numeric data from spss table(combodn23$HAPPY, combod23$HAPPYN) ## Next, a student asks "how can I make that same recode ## on a lot of variables?" I'm going to have to leave ## that one unanswered. I think the answer will probably ## be to get a list of variables, then use "lapply" to ## do the same thing to each variable in turn. But ## I have not written up a simple, understandable example ## yet ## After the data is all recoded and homogenized, then we ## could run any analysis we want, and throw in the variable ## "whichSurvey" to see if there is a difference beteween the ## two models. ## Example, choose your y and x1 and x2, then ## mod <- lm(y~ (x1+x2)*whichSurvey, data=combod23) ## or if you think the difference is just in the intercept: ## mod <- lm(y~ x1+x2 + whichSurvey, data=combod23) -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas
I can't really help you with your problem, but maybe importing with use.value.labels=FALSE will at least get rid of the 'duplicated levels' warnings. -Peter Ehlers Paul Johnson wrote:> My students are working with several SPSS dataset provided by the > European Social Survey. If you register your name, you can download it > too. This is the 2004 data, for example: > > http://ess.nsd.uib.no/ess/round2/ > > I cannot give you the European Survey dataset, but you can download it > for free if you like, and then you could run these commands to > re-produce this weird pattern described below. > > library(foreign) > d2 <- read.spss("ESS3e03_2.por") > warnings() > > str(d2$HAPPY) > d2 <- as.data.frame(d2) > str(d2$HAPPY) > > d2 <- read.spss("ESS3e03_2.por",to.data.frame=T) > warnings() > str(d2$HAPPY) > > Here's my info for this example: > >> sessionInfo() > R version 2.10.0 (2009-10-26) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] foreign_0.8-38 > > > The weirdness that follows is the difference between > > d2 <- read.spss( ... , to.data.frame=T) > > and > > d2 <- read.spss () > d2 <- as.data.frame(d2) > > The former causes all data to become <NA> but the latter seems mostly OK. > > >> library(foreign) >> d2 <- read.spss("ESS3e03_2.por") > warnings() > There were 12 warnings (use warnings() to see them) >> Warning messages: > 1: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : > duplicated levels will not be allowed in factors anymore > 2: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : > duplicated levels will not be allowed in factors anymore > 3: In `levels<-`(`*tmp*`, value = c("Refusal", "Don't know", ... : > duplicated levels will not be allowed in factors anymore > 4: In `levels<-`(`*tmp*`, value = c("No second language mentioned", ... : > duplicated levels will not be allowed in factors anymore > 5: In `levels<-`(`*tmp*`, value = c("Sans dipl", "Non dipl", ... : > duplicated levels will not be allowed in factors anymore > 6: In `levels<-`(`*tmp*`, value = c("\"Ej avslutad > folkskola/grundskola\"", ... : > duplicated levels will not be allowed in factors anymore > 7: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, > senior officials and managers", ... : > duplicated levels will not be allowed in factors anymore > 8: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, > senior officials and managers", ... : > duplicated levels will not be allowed in factors anymore > 9: In `levels<-`(`*tmp*`, value = c("K", "K", "Frederiksborg Amt", ... : > duplicated levels will not be allowed in factors anymore > 10: In `levels<-`(`*tmp*`, value = c("P", "L", "Kesk-Eesti", ... : > duplicated levels will not be allowed in factors anymore > 11: In `levels<-`(`*tmp*`, value = c("Galicia", "Principado de Asturias", ... : > duplicated levels will not be allowed in factors anymore > 12: In `levels<-`(`*tmp*`, value = c("Stockholm", "", "Sydsverige", ... : > duplicated levels will not be allowed in factors anymore > >> str(d2$HAPPY) > Factor w/ 14 levels "Extremely unhappy",..: 9 7 9 11 9 6 9 4 13 8 ... > >> d2 <- as.data.frame(d2) >> str(d2$HAPPY) > Factor w/ 14 levels "Extremely unhappy",..: 9 7 9 11 9 6 9 4 13 8 ... > > That appears valid. On my first effort, I had tried to get the data > frame in a single shot with read.spss > >> d2 <- read.spss("ESS3e03_2.por",to.data.frame=T) > There were 15 warnings (use warnings() to see them) >> warnings() > Warning messages: > 1: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : > longer object length is not a multiple of shorter object length > 2: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : > longer object length is not a multiple of shorter object length > 3: In xi >= z[1L] | xi <= z[2L] | xi[xi == z[3L]] : > longer object length is not a multiple of shorter object length > 4: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : > duplicated levels will not be allowed in factors anymore > 5: In `levels<-`(`*tmp*`, value = c("CENTRUMP", "", "FIDESZ", ... : > duplicated levels will not be allowed in factors anymore > 6: In `levels<-`(`*tmp*`, value = c("Refusal", "Don't know", ... : > duplicated levels will not be allowed in factors anymore > 7: In `levels<-`(`*tmp*`, value = c("No second language mentioned", ... : > duplicated levels will not be allowed in factors anymore > 8: In `levels<-`(`*tmp*`, value = c("Sans dipl", "Non dipl", ... : > duplicated levels will not be allowed in factors anymore > 9: In `levels<-`(`*tmp*`, value = c("\"Ej avslutad > folkskola/grundskola\"", ... : > duplicated levels will not be allowed in factors anymore > 10: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, > senior officials and managers", ... : > duplicated levels will not be allowed in factors anymore > 11: In `levels<-`(`*tmp*`, value = c("Armed forces", "Legislators, > senior officials and managers", ... : > duplicated levels will not be allowed in factors anymore > 12: In `levels<-`(`*tmp*`, value = c("K", "K", "Frederiksborg Amt", ... : > duplicated levels will not be allowed in factors anymore > 13: In `levels<-`(`*tmp*`, value = c("P", "L", "Kesk-Eesti", ... : > duplicated levels will not be allowed in factors anymore > 14: In `levels<-`(`*tmp*`, value = c("Galicia", "Principado de Asturias", ... : > duplicated levels will not be allowed in factors anymore > 15: In `levels<-`(`*tmp*`, value = c("Stockholm", "", "Sydsverige", ... : > duplicated levels will not be allowed in factors anymore > > > str(d2$HAPPY) > Factor w/ 13 levels "Extremely unhappy",..: NA NA NA NA NA NA NA NA NA NA ... > > Oh, heck, all the values are missing!! Somehow, putting > "to.data.frame" inside the read.spss causes a different outcome than > using as.data.frame after reading in the data. > > The symptoms of this in R-2.9 are a little different, but the > conclusion the same. Help? > > In case you are a student who wants to work with this data, I can > share to you the large script that I have been accumulating so that > you might "play along". It turns out to be surprisingly difficult to > "recode" these factor variables that have levels like "none", "1", > "2",..."9", "total". > > > > ## Paul Johnson > ## November 13, 2009 > > ## A question arose in the lab. A student asks "I want > ## to compare the answers from two different editions > ## of the European Social Survey. > > ## I will add this to Stuff Worth Knowing later, but > ## I can share this tutorial to you right now. > > ## From this website: > > ## http://ess.nsd.uib.no/ess > > ## Download those European Social Survey Datasets into a directory. > > ## In a terminal, use the unzip command: > ## unzip ESS3e03_2.spss.zip > > ## unzip ESS2e03_1.spss.zip > > ## Then run the following in R. > > > library(foreign) > > d2 <- read.spss("ESS3e03_2.por",to.data.frame=T) > > > d2 <- read.spss("ESS3e03_2.por") > warnings() > > ### You can try to go into a data frame in one > ### step, that's an option in read.spss. But > ### we saw warnings, and wanted to be careful. > > d2 <- as.data.frame(d2) > d2$whichSurvey <- 2 > > d3 <- read.spss("ESS2e03_1.por") > > d3 <- as.data.frame(d3) > d3$whichSurvey <- 3 > > namesd2 <- names(d2) > namesd3 <- names(d3) > > commonNames <- intersect( namesd3, namesd2) > > combod23 <- rbind(d2[ , commonNames], d3[, commonNames]) > > save(combod23, file="combod23.Rda") > > > ## Error > ##Warning messages: > ##1: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : > ## invalid factor level, NAs generated > ##2: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA, : > ## invalid factor level, NAs generated > ##3: In `[<-.factor`(`*tmp*`, ri, value = c(1, 1, 1, 1, 1, 1, 1, 1, 1, : > ## invalid factor level, NAs generated > > ## That worries me a little bit. The warnings did too. > > ## Inspect a few lines in the result. > > combod23[1:4, ] > > ## fix doesn't work for me, did not bother to investigate. > > ##> fix(combod23) > ##Error in edit.data.frame(get(subx, envir = parent), title = subx, ...) : > ## can only handle vector and factor elements > ## That means some data from hell came into this thing. > > ## I suspect that combod23 is OK. > > ## The memory use on this exercise is huge! Try to help it > > rm (d2) > rm (d3) > > > ## But I worry. I have 2 ways that I use to try to figure this > ## out. One is to open the dataset in a clone of SPSS called > ## "PSPP". Actually, the executable is "psppire". > ## > ## The other thing I do is open the same data again in > ## a numeric format, and compare the 2 combined data frames > > ## This is also a useful exercise because it helps you > ## understand what a "factor" is in R. > > dn2 <- read.spss("ESS3e03_2.por", use.value.labels = F) > > > dn2 <- as.data.frame(dn2) > dn2$whichSurvey <- 2 > > dn3 <- read.spss("ESS2e03_1.por", use.value.labels = F) > > dn3 <- as.data.frame(dn3) > dn3$whichSurvey <- 3 > > ## Might be smart to compare > # dn2$HAPPY[1:50] > # d2$HAPPY[1:50] > > namesdn2 <- names(dn2) > namesdn3 <- names(dn3) > > commonNNames <- intersect( namesdn3, namesdn2 ) > > combodn23 <- rbind(dn2[ , commonNNames], dn3[, commonNNames]) > > save(combodn23, file="combodn23.Rda") > > table( combod23$HAPPY, combodn23$HAPPY) > > ## In summary, whenever I want to use a variable from > ## the combined data frame, I would probably compare > ## against combodn23 just to feel safe. > > > > > ## Note, after when you come back to work on this project again, you > ## might as well just reload the saved copies of combod23 and > ## combodn23. > > ## load("combod23.Rda") > > ## load("combodn23.Rda") > > ## That will put you at the current spot, no need to redo the merge > > > ## Now, about "recoding". If you just want numerical > ## data, you might consider using combodn23. > > ## But if you want some factors and some numberical > ## variables, then you might need to recode to reclaim > ## values. > > ## HAPPY turns out to be an interesting example of a > ## PAIN IN THE ASS because in SPSS, it is scored from > ## 0 to 10, but they give value labels only for scores > ## 1= Extremely unhappy > ## and > ## 10= Extremely happy > ## > ## And the SPSS column has no labels for values 1-9. > ## If SPSS gave NO labels at all, then this would come > ## into R as a numeric variable. BUT, because there are > ## 2 levels named, then R makes a factor out of it. > > ## When R turns it into a factor, you > ## end up with a nutty looking factor, which has > ## levels you don't really appreciate. > > levels(combod23$HAPPY) > # [1] "Extremely unhappy" "1" "2" > # [4] "3" "4" "5" > # [7] "6" "7" "8" > #[10] "9" "Extremely happy" "Refusal" > #[13] "Don't know" "No answer" > > > > ## Create a new variable to play with > combod23$HAPPY2 <- combod23$HAPPY > > ## Change Extremely Unhappy to text "0" > levels(combod23$HAPPY)[1] <- "0" > ## Change Extremely Happy to "10" > levels(combod23$HAPPY)[11] <- "10" > > HELL <- levels(combod23$HAPPY) > > ### Look at HELL > > HELL > > combod23$HAPPY2[combod23$HAPPY %in% HELL[12:14] ] <- NA > > ##CHECK RESULT > table(combod23$HAPPY, combod23$HAPPY2) > > > ## Eliminate the unused levels from HAPPY2 > combod23$HAPPY2 <- factor(combod23$HAPPY2) > ### Same is found with > ## combo23$HAPPY2 <- combo23$HAPPY2[ , drop=T] > > ## Use the "factor trick" to > ## reset the variable back to numeric: > > combod23$HAPPYN <- as.numeric(HELL)[combod23$HAPPYN] > > ##CHECK RESULT > table(combod23$HAPPY, combod23$HAPPY2) > > ## CHECK by comparing against numeric data from spss > table(combodn23$HAPPY, combod23$HAPPYN) > > > > > ## Next, a student asks "how can I make that same recode > ## on a lot of variables?" I'm going to have to leave > ## that one unanswered. I think the answer will probably > ## be to get a list of variables, then use "lapply" to > ## do the same thing to each variable in turn. But > ## I have not written up a simple, understandable example > ## yet > > > > ## After the data is all recoded and homogenized, then we > ## could run any analysis we want, and throw in the variable > ## "whichSurvey" to see if there is a difference beteween the > ## two models. > > ## Example, choose your y and x1 and x2, then > > ## mod <- lm(y~ (x1+x2)*whichSurvey, data=combod23) > > ## or if you think the difference is just in the intercept: > > ## mod <- lm(y~ x1+x2 + whichSurvey, data=combod23) >