Hello I have a question. I made an r-script and did a few commands needed to make some new variables. They all work out well, and when I run the commands, the new variables appear in the dataset. I can also work with these new variables to make other new variables from them. Also, when I use summary(dataset), those new variables appear in the summary of the dataset. But, when I do summary(new variable) => error: variable not found. For example: summary(couple_id) => Error in summary(couple_id) : object 'couple_id' not found. What can I do about this? R-script: attach(ipumsi_00008_dta) library(tinytex) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(forcats) library(mice) library(pander) library(ggcorrplot) library(lubridate) # true/false code when sploc is greater than zero and sprule is equal to 1 or 2 ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & (sprule==1 | sprule==2)) ## creating numeric code for rule_union & rule_unionn: 1 when sploc is greater than zero and sprule is equal to 1 or 2, 0 if not. ## This is neccesary because otherwise it is a logical code and we cannot multiply with it, which is needed ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric = as.numeric(rule_union)) ### creating unique numeric code for sploc / pernum variables ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code = pernum*(sex==1) + sploc*(sex==2)) #### dividing serial by 1000, otherwise, the ultimate couple_id is too large, and it works in this dataset because the serials start at 1000 (I will have to check if this works for other datasets) ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided = serial%/%1000) ##### creating unique union identifier ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, ipumsi_00008_dta$sploc_pernum_code) ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric = as.numeric(union_id)) ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id = union_id_numeric*rule_union_numeric) ###### Understanding the couple_id variable summary(couple_id) summary(ipumsi_00008_dta) I also attached my dataset. Thank you very much for the answer! Hannah Van Impe
Hi Hannah, Have you tried: summary(ipumsi_00008_dta$couple_id) Jim On Fri, Oct 30, 2020 at 7:34 PM Hannah Van Impe <hannahvanimpe at outlook.com> wrote:> Hello > > I have a question. I made an r-script and did a few commands needed to > make some new variables. They all work out well, and when I run the > commands, the new variables appear in the dataset. I can also work with > these new variables to make other new variables from them. Also, when I use > summary(dataset), those new variables appear in the summary of the dataset. > But, when I do summary(new variable) => error: variable not found. > For example: summary(couple_id) => Error in summary(couple_id) : object > 'couple_id' not found. What can I do about this? > > R-script: > attach(ipumsi_00008_dta) > library(tinytex) > library(dplyr) > library(ggplot2) > library(tidyr) > library(knitr) > library(forcats) > library(mice) > library(pander) > library(ggcorrplot) > library(lubridate) > # true/false code when sploc is greater than zero and sprule is equal to 1 > or 2 > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & > (sprule==1 | sprule==2)) > ## creating numeric code for rule_union & rule_unionn: 1 when sploc is > greater than zero and sprule is equal to 1 or 2, 0 if not. > ## This is neccesary because otherwise it is a logical code and we cannot > multiply with it, which is needed > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric > as.numeric(rule_union)) > ### creating unique numeric code for sploc / pernum variables > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code > pernum*(sex==1) + sploc*(sex==2)) > #### dividing serial by 1000, otherwise, the ultimate couple_id is too > large, and it works in this dataset because the serials start at 1000 (I > will have to check if this works for other datasets) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided > serial%/%1000) > ##### creating unique union identifier > ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, > ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, > ipumsi_00008_dta$sploc_pernum_code) > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric > as.numeric(union_id)) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id > union_id_numeric*rule_union_numeric) > ###### Understanding the couple_id variable > summary(couple_id) > summary(ipumsi_00008_dta) > > I also attached my dataset. > > Thank you very much for the answer! > Hannah Van Impe > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dear Hannah I think the problem is that attach() is not doing what you think it is. It does seem to make it easy to make mistakes. I would suggest switching to using with() instead or using the data = parameter to functions which support it. Michael On 30/10/2020 08:15, Hannah Van Impe wrote:> Hello > > I have a question. I made an r-script and did a few commands needed to make some new variables. They all work out well, and when I run the commands, the new variables appear in the dataset. I can also work with these new variables to make other new variables from them. Also, when I use summary(dataset), those new variables appear in the summary of the dataset. > But, when I do summary(new variable) => error: variable not found. > For example: summary(couple_id) => Error in summary(couple_id) : object 'couple_id' not found. What can I do about this? > > R-script: > attach(ipumsi_00008_dta) > library(tinytex) > library(dplyr) > library(ggplot2) > library(tidyr) > library(knitr) > library(forcats) > library(mice) > library(pander) > library(ggcorrplot) > library(lubridate) > # true/false code when sploc is greater than zero and sprule is equal to 1 or 2 > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & (sprule==1 | sprule==2)) > ## creating numeric code for rule_union & rule_unionn: 1 when sploc is greater than zero and sprule is equal to 1 or 2, 0 if not. > ## This is neccesary because otherwise it is a logical code and we cannot multiply with it, which is needed > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(rule_union_numeric = as.numeric(rule_union)) > ### creating unique numeric code for sploc / pernum variables > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_pernum_code = pernum*(sex==1) + sploc*(sex==2)) > #### dividing serial by 1000, otherwise, the ultimate couple_id is too large, and it works in this dataset because the serials start at 1000 (I will have to check if this works for other datasets) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, serial_divided = serial%/%1000) > ##### creating unique union identifier > ipumsi_00008_dta$union_id = paste0(ipumsi_00008_dta$country, ipumsi_00008_dta$year, ipumsi_00008_dta$serial_divided, ipumsi_00008_dta$sploc_pernum_code) > ipumsi_00008_dta <- ipumsi_00008_dta %>% mutate(union_id_numeric = as.numeric(union_id)) > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, couple_id = union_id_numeric*rule_union_numeric) > ###### Understanding the couple_id variable > summary(couple_id) > summary(ipumsi_00008_dta) > > I also attached my dataset. > > Thank you very much for the answer! > Hannah Van Impe > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Michael http://www.dewey.myzen.co.uk/home.html
Hi Hannah,
Using the same code I sent before, you can append the partner codes to the
household code. I apologize, but I don't know how to use the
dplyr/tidyr/... stuff so this is written in straight R code using logic
statements.
ipumsi_00008_dta<-
read.table(
text="country year sample serial hhwt pernum perwt resident sploc sprule
204 2013 204201301 4000 10 1 10 1 5 2
204 2013 204201301 4000 10 2 10 1 0 2
204 2013 204201301 4000 10 3 10 1 0 2
204 2013 204201301 4000 10 4 10 1 0 2
204 2013 204201301 4000 10 5 10 1 1 2
204 2013 204201301 4000 10 6 10 1 1 2
204 2013 204201301 4000 10 7 10 1 0 2
204 2013 204201301 4000 10 8 10 1 0 2
204 2013 204201301 4000 10 9 10 1 0 2
204 2013 204201301 7000 10 1 10 1 2 1
204 2013 204201301 7000 10 2 10 1 1 1
204 2013 204201301 7000 10 3 10 1 0 0
204 2013 204201301 7000 10 4 10 1 5 1
204 2013 204201301 7000 10 5 10 1 4 1",
header=TRUE,stringsAsFactors=FALSE)
for(hh in unique(ipumsi_00008_dta$serial)) {
cat("hh",hh," ")
for(ind in ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh]) {
cat("ind",ind,"\n")
if(ipumsi_00008_dta$sploc[ipumsi_00008_dta$serial == hh &
ipumsi_00008_dta$pernum == ind] > 0) {
cat("sploc > 0\n")
relationships<-
ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh &
ipumsi_00008_dta$sploc == ind]
cat(relationships,"\n")
ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh &
ipumsi_00008_dta$pernum == ind]<-
paste(c(hh,relationships),sep="",collapse="_")
} else {
ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh &
ipumsi_00008_dta$pernum == ind]<-hh
}
}
}
ipumsi_00008_dta
This appends the partner codes using "_" as a separator. You can do it
without the "_" and get a numeric variable, but I think you will
generate
ambiguous "sprule" codes. Again, this will not work with
between-household
relationships.
Jim
On Fri, Oct 30, 2020 at 11:16 PM Hannah Van Impe <hannahvanimpe at
outlook.com>
wrote:
> Thank you very much for the answer.
>
> I also have another question. With this data, I made the variable union_id
> using paste0. (I am writing a thesis and this part is necessary, but I
> don?t have previous knowledge of R, so it is difficult for me). My
> professor told me, that if I use paste0, it can be problematic because
> ?pernum? and ?serial? can have different numbers of digits. In other words,
> if pernum is 12 then paste0 prints 12, but if pernum is 1 paste0 prints 1
> instead of 01, so you will not have that the last two digits always
> correspond to pernum. He suggest me to use an alternative way for creating
> this union_id variable. Do you have any idea how I can do this?
>
>
[[alternative HTML version deleted]]