Yuan Chun Ding
2020-Apr-07 18:34 UTC
[R] to create a new variable based on values in other variables
Hi R users, I want to create a new variable, Ravg, in data frame tem2 based on values of two other variables m1 and m2. the condition: if m1 = 23 and m2 =23 then Ravg =23; else if m1 != 23 and m2=23 then Ravg =m1; else if m1 =23 and m2 !=23 then Ravg=m2; else Ravg=average of m1 and m2; the Ravg variable should be same as m3 variable in the following small example. my R code did not generate errors but not generate a new variable. Ravg <- "rare_allele" tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) for (r in 1:nrow(tem2)) { if (tem2$m1[r] ==23 & tem2$m2[r] ==23) { tem2[[Ravg]][r] ==23} else if(tem2$m1[r] ==23 & tem2$m2[r] !=23){ tem2[[Ravg]][r] ==tem2$m2[r]} else if (tem2$m1[r] !=23 & tem2$m2[r] ==23) { tem2[[Ravg]][r] ==tem2$m1[r]} else { tem2[[Ravg]][r] == mean(tem2$m1[r] + tem2$m2[r])} } Thank you, Ding ---------------------------------------------------------------------- ------------------------------------------------------------ -SECURITY/CONFIDENTIALITY WARNING- This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
Thierry Onkelinx
2020-Apr-07 18:48 UTC
[R] to create a new variable based on values in other variables
Dear Ding, It seems that you are looking for the ifelse() function. Clear use of pmax() and pmin() reduces the number of if statements. m1 <- c(12, 23, 22, 23) m2 <- c(23, 23, 3, 5) Ravg <- ifelse( pmax(m1, m2) == 23, pmin(m1, m2), (m1 + m2) / 2 ) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// <https://www.inbo.be> Op di 7 apr. 2020 om 20:35 schreef Yuan Chun Ding <ycding at coh.org>:> Hi R users, > > > I want to create a new variable, Ravg, in data frame tem2 based on values > of two other variables m1 and m2. > > the condition: > > if m1 = 23 and m2 =23 then Ravg =23; > else if m1 != 23 and m2=23 then Ravg =m1; > else if m1 =23 and m2 !=23 then Ravg=m2; > else Ravg=average of m1 and m2; > > the Ravg variable should be same as m3 variable in the following small > example. > > my R code did not generate errors but not generate a new variable. > > Ravg <- "rare_allele" > tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, > 23, 12.5, 5)) > for (r in 1:nrow(tem2)) { > if (tem2$m1[r] ==23 & tem2$m2[r] ==23) { > tem2[[Ravg]][r] ==23} else if(tem2$m1[r] ==23 & tem2$m2[r] !=23){ > tem2[[Ravg]][r] ==tem2$m2[r]} else if (tem2$m1[r] !=23 & tem2$m2[r] > ==23) { > tem2[[Ravg]][r] ==tem2$m1[r]} else { > tem2[[Ravg]][r] == mean(tem2$m1[r] + tem2$m2[r])} > } > > Thank you, > > Ding > > ---------------------------------------------------------------------- > ------------------------------------------------------------ > -SECURITY/CONFIDENTIALITY WARNING- > > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of > the communication is strictly prohibited. If you received the communication > in error, please notify the sender immediately by replying to this message > and deleting the message and any accompanying files from your system. If, > due to the security risks, you do not wish to receive further > communications via e-mail, please reply to this message and inform the > sender that you do not wish to receive further e-mail from the sender. > (LCP301) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Yuan Chun Ding
2020-Apr-07 19:05 UTC
[R] to create a new variable based on values in other variables
Hi Thierry, the values in the example data frame are fake numbers, my original data frame has hundreds of row and values are in wide range, not min or max of two variables, also the number 23 is also different in different data frames. I agree I need to use vectorized ifelse, but I got confused when there are more than two conditions inside of ifelse function. I will look into ifelse function Thank you, Ding From: Thierry Onkelinx [thierry.onkelinx at inbo.be] Sent: Tuesday, April 7, 2020 11:48 AM To: Yuan Chun Ding Cc: r-help mailing list Subject: Re: [R] to create a new variable based on values in other variables Dear Ding, It seems that you are looking for the ifelse() function. Clear use of pmax() and pmin() reduces the number of if statements. m1 <- c(12, 23, 22, 23) m2 <- c(23, 23, 3, 5) Ravg <- ifelse( pmax(m1, m2) == 23, pmin(m1, m2), (m1 + m2) / 2 ) Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// Op di 7 apr. 2020 om 20:35 schreef Yuan Chun Ding <ycding at coh.org>: Hi R users, I want to create a new variable, Ravg, in data frame tem2 based on values of two other variables m1 and m2. the condition: if m1 = 23 and m2 =23 then Ravg =23; else if m1 != 23 and m2=23 then Ravg =m1; else if m1 =23 and m2 !=23 then Ravg=m2; else Ravg=average of m1 and m2; the Ravg variable should be same as m3 variable in the following small example. my R code did not generate errors but not generate a new variable. Ravg <- "rare_allele" tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) for (r in 1:nrow(tem2)) { if (tem2$m1[r] ==23 & tem2$m2[r] ==23) { tem2[[Ravg]][r] ==23} else if(tem2$m1[r] ==23 & tem2$m2[r] !=23){ tem2[[Ravg]][r] ==tem2$m2[r]} else if (tem2$m1[r] !=23 & tem2$m2[r] ==23) { tem2[[Ravg]][r] ==tem2$m1[r]} else { tem2[[Ravg]][r] == mean(tem2$m1[r] + tem2$m2[r])} } Thank you, Ding ---------------------------------------------------------------------- ------------------------------------------------------------ -SECURITY/CONFIDENTIALITY WARNING- This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301) ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2020-Apr-07 19:53 UTC
[R] to create a new variable based on values in other variables
You can use subscripting to generalize and avoid multiply nested ifelse's which, I agree, can be a nightmare. However, you have to be very careful about the logic of the conditions you create and the order in which you apply them. It is very easy to wipe out an earlier relationship with a later one (I speak from sad experience here). Note that because the following uses subscripting, it's vectorized. But you of course have to set up all your conditions manually. Note that your last example condition is redundant, btw. As Thierry indicated, depending on what you do, there can be shortcuts. To keep the solution generalizable, I have not used any.> tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) > tem2$ravg <- rowMeans(tem2[,c("m1","m2")])## or use with() or within() for more complex functions that you have to code yourself.> cond1 <- with(tem2,m1!= 23 & m2 == 23) > cond2 <- with(tem2, m1 == 23 & m2 != 23)## etc.> tem2 <- within(tem2,{+ ravg[cond1] <- m1[cond1] + ravg[cond2] <- m2[cond2] + })> > tem2m1 m2 m3 ravg 1 12 23 12.0 12.0 2 23 23 23.0 23.0 3 22 3 12.5 12.5 4 23 5 5.0 5.0 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 7, 2020 at 11:34 AM Yuan Chun Ding <ycding at coh.org> wrote:> > Hi R users, > > > I want to create a new variable, Ravg, in data frame tem2 based on values of two other variables m1 and m2. > > the condition: > > if m1 = 23 and m2 =23 then Ravg =23; > else if m1 != 23 and m2=23 then Ravg =m1; > else if m1 =23 and m2 !=23 then Ravg=m2; > else Ravg=average of m1 and m2; > > the Ravg variable should be same as m3 variable in the following small example. > > my R code did not generate errors but not generate a new variable. > > Ravg <- "rare_allele" > tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) > for (r in 1:nrow(tem2)) { > if (tem2$m1[r] ==23 & tem2$m2[r] ==23) { > tem2[[Ravg]][r] ==23} else if(tem2$m1[r] ==23 & tem2$m2[r] !=23){ > tem2[[Ravg]][r] ==tem2$m2[r]} else if (tem2$m1[r] !=23 & tem2$m2[r] ==23) { > tem2[[Ravg]][r] ==tem2$m1[r]} else { > tem2[[Ravg]][r] == mean(tem2$m1[r] + tem2$m2[r])} > } > > Thank you, > > Ding > > ---------------------------------------------------------------------- > ------------------------------------------------------------ > -SECURITY/CONFIDENTIALITY WARNING- > > This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Yuan Chun Ding
2020-Apr-08 00:04 UTC
[R] to create a new variable based on values in other variables
Hi Bert, your code worked perfect. you always make me learn new R code skills! Thank you so much!! Ding ________________________________________ From: Bert Gunter [bgunter.4567 at gmail.com] Sent: Tuesday, April 7, 2020 12:53 PM To: Yuan Chun Ding Cc: r-help mailing list Subject: Re: [R] to create a new variable based on values in other variables You can use subscripting to generalize and avoid multiply nested ifelse's which, I agree, can be a nightmare. However, you have to be very careful about the logic of the conditions you create and the order in which you apply them. It is very easy to wipe out an earlier relationship with a later one (I speak from sad experience here). Note that because the following uses subscripting, it's vectorized. But you of course have to set up all your conditions manually. Note that your last example condition is redundant, btw. As Thierry indicated, depending on what you do, there can be shortcuts. To keep the solution generalizable, I have not used any.> tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) > tem2$ravg <- rowMeans(tem2[,c("m1","m2")])## or use with() or within() for more complex functions that you have to code yourself.> cond1 <- with(tem2,m1!= 23 & m2 == 23) > cond2 <- with(tem2, m1 == 23 & m2 != 23)## etc.> tem2 <- within(tem2,{+ ravg[cond1] <- m1[cond1] + ravg[cond2] <- m2[cond2] + })> > tem2m1 m2 m3 ravg 1 12 23 12.0 12.0 2 23 23 23.0 23.0 3 22 3 12.5 12.5 4 23 5 5.0 5.0 Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 7, 2020 at 11:34 AM Yuan Chun Ding <ycding at coh.org> wrote:> > Hi R users, > > > I want to create a new variable, Ravg, in data frame tem2 based on values of two other variables m1 and m2. > > the condition: > > if m1 = 23 and m2 =23 then Ravg =23; > else if m1 != 23 and m2=23 then Ravg =m1; > else if m1 =23 and m2 !=23 then Ravg=m2; > else Ravg=average of m1 and m2; > > the Ravg variable should be same as m3 variable in the following small example. > > my R code did not generate errors but not generate a new variable. > > Ravg <- "rare_allele" > tem2 <-data.frame(m1=c(12, 23, 22, 23), m2=c(23, 23, 3, 5), m3 =c(12, 23, 12.5, 5)) > for (r in 1:nrow(tem2)) { > if (tem2$m1[r] ==23 & tem2$m2[r] ==23) { > tem2[[Ravg]][r] ==23} else if(tem2$m1[r] ==23 & tem2$m2[r] !=23){ > tem2[[Ravg]][r] ==tem2$m2[r]} else if (tem2$m1[r] !=23 & tem2$m2[r] ==23) { > tem2[[Ravg]][r] ==tem2$m1[r]} else { > tem2[[Ravg]][r] == mean(tem2$m1[r] + tem2$m2[r])} > } > > Thank you, > > Ding > > ---------------------------------------------------------------------- > ------------------------------------------------------------ > -SECURITY/CONFIDENTIALITY WARNING- > > This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301) > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!5Ho4zb8Yy0d9gaWdgkNkC1NGMpUUY6kjUlH-XhJmj9UzTaeukRr5IuBRIePx$ > PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!5Ho4zb8Yy0d9gaWdgkNkC1NGMpUUY6kjUlH-XhJmj9UzTaeukRr5Io0-e-EB$ > and provide commented, minimal, self-contained, reproducible code.