Hello I have a question. I made an r-script and did a few commands needed to make some new variables. They all work out well, and when I run the commands, the new variables appear in the dataset. I can also work with these new variables to make other new variables from them. Also, when I use summary(dataset), those new variables appear in the summary of the dataset. But, when I do summary(new variable) => error: variable not found. For example: summary(couple_id) => Error in summary(couple_id) : object 'couple_id' not found. What can I do about this? R-script: attach(ipumsi_00008_dta) library(tinytex) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(forcats) library(mice) library(pander) library(ggcorrplot) library(lubridate) # true/false code when sploc is greater than zero and sprule is equal to 1 or 2 ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & (sprule==1 | sprule==2)) ## creating numeric code for rule_union & rule_unionn: 1 when sploc is greater than zero and sprule is equal to 1 or 2, 0 if not. ## This is neccesary because otherwise it is a logical code and we cannot multiply with it, which is needed ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric = as.numeric(rule_union)) ### creating unique numeric code for sploc / pernum variables ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code = pernum*(sex==1) + sploc*(sex==2)) #### dividing serial by 1000, otherwise, the ultimate couple_id is too large, and it works in this dataset because the serials start at 1000 (I will have to check if this works for other datasets) ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided = serial%/%1000) ##### creating unique union identifier ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, ipumsi_00008_dta$sploc_pernum_code) ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric = as.numeric(union_id)) ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id = union_id_numeric*rule_union_numeric) ###### Understanding the couple_id variable summary(couple_id) summary(ipumsi_00008_dta) I also attached my dataset. Thank you very much for the answer! Hannah Van Impe
Hi Hannah, Have you tried: summary(ipumsi_00008_dta$couple_id) Jim On Fri, Oct 30, 2020 at 7:34 PM Hannah Van Impe <hannahvanimpe at outlook.com> wrote:> Hello > > I have a question. I made an r-script and did a few commands needed to > make some new variables. They all work out well, and when I run the > commands, the new variables appear in the dataset. I can also work with > these new variables to make other new variables from them. Also, when I use > summary(dataset), those new variables appear in the summary of the dataset. > But, when I do summary(new variable) => error: variable not found. > For example: summary(couple_id) => Error in summary(couple_id) : object > 'couple_id' not found. What can I do about this? > > R-script: > attach(ipumsi_00008_dta) > library(tinytex) > library(dplyr) > library(ggplot2) > library(tidyr) > library(knitr) > library(forcats) > library(mice) > library(pander) > library(ggcorrplot) > library(lubridate) > # true/false code when sploc is greater than zero and sprule is equal to 1 > or 2 > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & > (sprule==1 | sprule==2)) > ## creating numeric code for rule_union & rule_unionn: 1 when sploc is > greater than zero and sprule is equal to 1 or 2, 0 if not. > ## This is neccesary because otherwise it is a logical code and we cannot > multiply with it, which is needed > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric > as.numeric(rule_union)) > ### creating unique numeric code for sploc / pernum variables > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code > pernum*(sex==1) + sploc*(sex==2)) > #### dividing serial by 1000, otherwise, the ultimate couple_id is too > large, and it works in this dataset because the serials start at 1000 (I > will have to check if this works for other datasets) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided > serial%/%1000) > ##### creating unique union identifier > ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, > ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, > ipumsi_00008_dta$sploc_pernum_code) > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric > as.numeric(union_id)) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id > union_id_numeric*rule_union_numeric) > ###### Understanding the couple_id variable > summary(couple_id) > summary(ipumsi_00008_dta) > > I also attached my dataset. > > Thank you very much for the answer! > Hannah Van Impe > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Hannah I think the problem is that attach() is not doing what you think it is. It does seem to make it easy to make mistakes. I would suggest switching to using with() instead or using the data = parameter to functions which support it. Michael On 30/10/2020 08:15, Hannah Van Impe wrote:> Hello > > I have a question. I made an r-script and did a few commands needed to make some new variables. They all work out well, and when I run the commands, the new variables appear in the dataset. I can also work with these new variables to make other new variables from them. Also, when I use summary(dataset), those new variables appear in the summary of the dataset. > But, when I do summary(new variable) => error: variable not found. > For example: summary(couple_id) => Error in summary(couple_id) : object 'couple_id' not found. What can I do about this? > > R-script: > attach(ipumsi_00008_dta) > library(tinytex) > library(dplyr) > library(ggplot2) > library(tidyr) > library(knitr) > library(forcats) > library(mice) > library(pander) > library(ggcorrplot) > library(lubridate) > # true/false code when sploc is greater than zero and sprule is equal to 1 or 2 > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & (sprule==1 | sprule==2)) > ## creating numeric code for rule_union & rule_unionn: 1 when sploc is greater than zero and sprule is equal to 1 or 2, 0 if not. > ## This is neccesary because otherwise it is a logical code and we cannot multiply with it, which is needed > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric = as.numeric(rule_union)) > ### creating unique numeric code for sploc / pernum variables > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code = pernum*(sex==1) + sploc*(sex==2)) > #### dividing serial by 1000, otherwise, the ultimate couple_id is too large, and it works in this dataset because the serials start at 1000 (I will have to check if this works for other datasets) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided = serial%/%1000) > ##### creating unique union identifier > ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, ipumsi_00008_dta$sploc_pernum_code) > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric = as.numeric(union_id)) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id = union_id_numeric*rule_union_numeric) > ###### Understanding the couple_id variable > summary(couple_id) > summary(ipumsi_00008_dta) > > I also attached my dataset. > > Thank you very much for the answer! > Hannah Van Impe > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
Hi Hannah, Using the same code I sent before, you can append the partner codes to the household code. I apologize, but I don't know how to use the dplyr/tidyr/... stuff so this is written in straight R code using logic statements. ipumsi_00008_dta<- read.table( text="country year sample serial hhwt pernum perwt resident sploc sprule 204 2013 204201301 4000 10 1 10 1 5 2 204 2013 204201301 4000 10 2 10 1 0 2 204 2013 204201301 4000 10 3 10 1 0 2 204 2013 204201301 4000 10 4 10 1 0 2 204 2013 204201301 4000 10 5 10 1 1 2 204 2013 204201301 4000 10 6 10 1 1 2 204 2013 204201301 4000 10 7 10 1 0 2 204 2013 204201301 4000 10 8 10 1 0 2 204 2013 204201301 4000 10 9 10 1 0 2 204 2013 204201301 7000 10 1 10 1 2 1 204 2013 204201301 7000 10 2 10 1 1 1 204 2013 204201301 7000 10 3 10 1 0 0 204 2013 204201301 7000 10 4 10 1 5 1 204 2013 204201301 7000 10 5 10 1 4 1", header=TRUE,stringsAsFactors=FALSE) for(hh in unique(ipumsi_00008_dta$serial)) { cat("hh",hh," ") for(ind in ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh]) { cat("ind",ind,"\n") if(ipumsi_00008_dta$sploc[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind] > 0) { cat("sploc > 0\n") relationships<- ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$sploc == ind] cat(relationships,"\n") ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind]<- paste(c(hh,relationships),sep="",collapse="_") } else { ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind]<-hh } } } ipumsi_00008_dta This appends the partner codes using "_" as a separator. You can do it without the "_" and get a numeric variable, but I think you will generate ambiguous "sprule" codes. Again, this will not work with between-household relationships. Jim On Fri, Oct 30, 2020 at 11:16 PM Hannah Van Impe <hannahvanimpe at outlook.com> wrote:> Thank you very much for the answer. > > I also have another question. With this data, I made the variable union_id > using paste0. (I am writing a thesis and this part is necessary, but I > don?t have previous knowledge of R, so it is difficult for me). My > professor told me, that if I use paste0, it can be problematic because > ?pernum? and ?serial? can have different numbers of digits. In other words, > if pernum is 12 then paste0 prints 12, but if pernum is 1 paste0 prints 1 > instead of 01, so you will not have that the last two digits always > correspond to pernum. He suggest me to use an alternative way for creating > this union_id variable. Do you have any idea how I can do this? > >[[alternative HTML version deleted]]