Ding, Yuan Chun
2018-Apr-19  18:20 UTC
[R] create multiple categorical variables in a data frame using a loop
Hi All,
I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a
data frame) based on log2pfoa values. I can do it using the following code.
pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA
  cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm
=T)]<-0
  cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm
=T)]<-2
  cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)
           &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75,
na.rm =T)]<-1
  }
However, I have additional 7 similar variables, so I wrote the following code,
but it does not work.
for (i in c("log2pfoa","log2pfos", "log2pfna",
"log2pfdea",   "log2pfuda", "log2pfhxs",
"log2et_pfosa_acoh", "log2me_pfosa_acoh"))  {
cat.var <- paste0("cat.",i)
pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA
eval(parse(text=cat.var))[pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25,
na.rm =T)] <- 0
eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.75,
na.rm =T)] <- 2
eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.25,
na.rm =T)
                                  &pfas.pheno[,i] <=
quantile(pfas.pheno[,i],0.75, na.rm =T)] < -1
})
                                                                                
}
Can you help me fix the problem?
Thank you,
Yuan Chun Ding
City of Hope National Medical Center
---------------------------------------------------------------------
-SECURITY/CONFIDENTIALITY WARNING-
This message (and any attachments) are intended solely f...{{dropped:22}}
Rui Barradas
2018-Apr-19  18:35 UTC
[R] create multiple categorical variables in a data frame using a loop
Hello, When programming it is better to use dat[["variable"]] than dat$variable. So your code could be pfas.pheno[[cat.var]] <- NA pfas.pheno[[cat.var]][pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0 etc. Untested. Hope this helps, Rui Barradas On 4/19/2018 7:20 PM, Ding, Yuan Chun wrote:> Hi All, > > I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code. > > pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA > cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T) > &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1 > } > > However, I have additional 7 similar variables, so I wrote the following code, but it does not work. > > for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { > cat.var <- paste0("cat.",i) > pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA > eval(parse(text=cat.var))[pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0 > eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.75, na.rm =T)] <- 2 > eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.25, na.rm =T) > &pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.75, na.rm =T)] < -1 > }) > } > > Can you help me fix the problem? > > Thank you, > > Yuan Chun Ding > City of Hope National Medical Center > > > > --------------------------------------------------------------------- > -SECURITY/CONFIDENTIALITY WARNING- > This message (and any attachments) are intended solely f...{{dropped:22}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Ding, Yuan Chun
2018-Apr-19  19:33 UTC
[R] create multiple categorical variables in a data frame using a loop
Hi Rui, Thank you very much for your help!! It works very well, I got it. Ding -----Original Message----- From: Rui Barradas [mailto:ruipbarradas at sapo.pt] Sent: Thursday, April 19, 2018 11:35 AM To: Ding, Yuan Chun <ycding at coh.org>; r-help at r-project.org Subject: Re: [R] create multiple categorical variables in a data frame using a loop [Attention: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.] Hello, When programming it is better to use dat[["variable"]] than dat$variable. So your code could be pfas.pheno[[cat.var]] <- NA pfas.pheno[[cat.var]][pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0 etc. Untested. Hope this helps, Rui Barradas On 4/19/2018 7:20 PM, Ding, Yuan Chun wrote:> Hi All, > > I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code. > > pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA > cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T) > &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1 > } > > However, I have additional 7 similar variables, so I wrote the following code, but it does not work. > > for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { > cat.var <- paste0("cat.",i) > pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA > eval(parse(text=cat.var))[pfas.pheno[,i] <= > quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0 > eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.75, na.rm =T)] <- 2 eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.25, na.rm =T) > &pfas.pheno[,i] <= > quantile(pfas.pheno[,i],0.75, na.rm =T)] < -1 > }) > > } > > Can you help me fix the problem? > > Thank you, > > Yuan Chun Ding > City of Hope National Medical Center > > > > --------------------------------------------------------------------- > -SECURITY/CONFIDENTIALITY WARNING- > This message (and any attachments) are intended solely > f...{{dropped:22}} > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2018-Apr-19  20:22 UTC
[R] create multiple categorical variables in a data frame using a loop
> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding at coh.org> wrote: > > Hi All, > > I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code. > > pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA > cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2 > cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T) > &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1 > }This would be somewhat more compact and easier to maintain if you used findInterval (untested in the absence of a data object, which is your responsibility): pfas.pheno <-within(pfas.pheno, { cat.pfoa <- findInterval( log2pfoa , c(-Inf, quantile( log2pfoa,c(.25,.75), Inf), na.rm =T), Inf)]-1 } ) `findInterval` numbers its intervals from 1, so to get a sequence starting at 0 just subtract 1.> However, I have additional 7 similar variables, so I wrote the following code, but it does not work. > > for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { > cat.var <- paste0("cat.",i) > pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NANope. Cannot use R like a macro processor, at least not easily. R names are not the same as character vlaues. They "live in different realities". The `get` and `assign` functions can be used to "promote" character values to real R names and make assignments from and to what would otherwise be merely character values. Perhaps this (also mostly untested (except for the strategy of making `assign` creat a new dataframe column: for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { cat.var <- paste0("cat.",i) assign( cat.var, findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } ), envir=as.environment( get( pfas.pheno ) ) ) Best; David.> eval(parse(text=cat.var))[pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.25, na.rm =T)] <- 0 > eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.75, na.rm =T)] <- 2 > eval(parse(text=cat.var))[pfas.pheno[,i] >= quantile(pfas.pheno[,i],0.25, na.rm =T) > &pfas.pheno[,i] <= quantile(pfas.pheno[,i],0.75, na.rm =T)] < -1 > }) > } > > Can you help me fix the problem? > > Thank you, > > Yuan Chun Ding > City of Hope National Medical Center > > > > --------------------------------------------------------------------- > -SECURITY/CONFIDENTIALITY WARNING- > This message (and any attachments) are intended solely...{{dropped:20}}
David Winsemius
2018-Apr-20  01:58 UTC
[R] create multiple categorical variables in a data frame using a loop
> On Apr 19, 2018, at 1:22 PM, David Winsemius <dwinsemius at comcast.net> wrote: > > >> On Apr 19, 2018, at 11:20 AM, Ding, Yuan Chun <ycding at coh.org> wrote: >> >> Hi All, >> >> I want to create a categorical variable, cat.pfoa, in the file of pfas.pheno (a data frame) based on log2pfoa values. I can do it using the following code. >> >> pfas.pheno <-within(pfas.pheno, {cat.pfoa<-NA >> cat.pfoa[pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T)]<-0 >> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-2 >> cat.pfoa[pfas.pheno$log2pfoa >=quantile(pfas.pheno$log2pfoa,0.25, na.rm =T) >> &pfas.pheno$log2pfoa <=quantile(pfas.pheno$log2pfoa,0.75, na.rm =T)]<-1 >> } > > This would be somewhat more compact and easier to maintain if you used findInterval (untested in the absence of a data object, which is your responsibility): > > pfas.pheno <-within(pfas.pheno, { > cat.pfoa <- findInterval( log2pfoa , c(-Inf, quantile( log2pfoa,c(.25,.75), Inf), na.rm =T), Inf)]-1 } ) > > > `findInterval` numbers its intervals from 1, so to get a sequence starting at 0 just subtract 1. > > >> However, I have additional 7 similar variables, so I wrote the following code, but it does not work. >> >> for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { >> cat.var <- paste0("cat.",i) >> pfas.pheno <- within(pfas.pheno, {eval(parse(text= cat.var))<-NA > > Nope. Cannot use R like a macro processor, at least not easily. R names are not the same as character vlaues. They "live in different realities". The `get` and `assign` functions can be used to "promote" character values to real R names and make assignments from and to what would otherwise be merely character values. > > Perhaps this (also mostly untested (except for the strategy of making `assign` creat a new dataframe column: > > for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", > "log2me_pfosa_acoh")) { > cat.var <- paste0("cat.",i) > assign( cat.var, findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } ), > envir=as.environment( get( pfas.pheno ) ) )That wasn't good advice. I would rather suggest (but still untested in the absence of a good demo dataset from the OP): for (i in c("log2pfoa","log2pfos", "log2pfna", "log2pfdea", "log2pfuda", "log2pfhxs", "log2et_pfosa_acoh", "log2me_pfosa_acoh")) { cat.var <- paste0("cat.",i) pfas.pheno[[ cat.var ]] <- findInterval( get(i) , c(-Inf, quantile( get(i), c(.25,.75), Inf), na.rm =T), Inf)]-1 } The "[[<-" function supports character values as column names during assignment.>-- David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
Reasonably Related Threads
- create multiple categorical variables in a data frame using a loop
- create multiple categorical variables in a data frame using a loop
- create multiple categorical variables in a data frame using a loop
- color of lines while printing through for loop
- paste? 'cmd /c "c:\\pheno\\whap --file c:\\pheno\\smri --alt 1"'