Kevin Wamae
2017-Oct-14 05:48 UTC
[R] Populate one data frame with values from another dataframe for rows that match
Dear @Bert Gunter<mailto:bgunter.4567 at gmail.com>, I tried merge and I faced many challenges. @Rui Barradas<mailto:ruipbarradas at sapo.pt> solution is working. From: Bert Gunter <bgunter.4567 at gmail.com> Date: Friday, 13 October 2017 at 22:44 To: Kevin Wamae <KWamae at kemri-wellcome.org> Cc: R-help <R-help at r-project.org> Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match ?merge Bert On Oct 13, 2017 12:09 PM, "Kevin Wamae" <KWamae at kemri-wellcome.org<mailto:KWamae at kemri-wellcome.org>> wrote: I'm trying to populate the column ?pf_mcl? in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output. Below is a snapshot of the data.frames. myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9", "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_ ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame") myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6", "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class = "data.frame") myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong. All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on ?studyno?, the value should remain the same. It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes. ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________ [[alternative HTML version deleted]]
William Dunlap
2017-Oct-14 17:20 UTC
[R] Populate one data frame with values from another dataframe for rows that match
Your example used one distinct studyno in DF1 and one distinct pf_mcl in DF2. I think that makes it hard to see what is going on, but maybe I completely misunderstand the problem. In any case, let's redefine myDF1 and myDF2. Note that myDF1 contains a studyno not in myDF2 and vice versa. myDF1 <- structure(list(studyno = c("J1000/9", "J895/7", "J931/6", "J666/6", "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, 2L, 3L, 4L, 5L, NA_integer_ ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame") myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6", "J609/1", "J941/3"), pf_mcl = c(101L, 102L, 103L, 104L, 105L, 106L)), .Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class "data.frame") m <- merge(myDF1, myDF2, by="studyno", all.x=TRUE, all.y=FALSE, suffixes=c(".raw", ".curated")) The results are:> myDF1studyno date pf_mcl year 1 J1000/9 2016-11-18 NA 2016 2 J895/7 2016-11-22 2 2016 3 J931/6 2016-11-30 3 2016 4 J666/6 2016-12-09 4 2016 5 J1000/9 2016-12-13 5 2016 6 J1000/9 2016-12-20 NA 2016> myDF2studyno pf_mcl 1 J740/4 101 2 J1000/9 102 3 J895/7 103 4 J931/6 104 5 J609/1 105 6 J941/3 106> mstudyno date pf_mcl.raw year pf_mcl.curated 1 J1000/9 2016-11-18 NA 2016 102 2 J1000/9 2016-12-13 5 2016 102 3 J1000/9 2016-12-20 NA 2016 102 4 J666/6 2016-12-09 4 2016 NA 5 J895/7 2016-11-22 2 2016 103 6 J931/6 2016-11-30 3 2016 104 Now your problem is to combine the columns pf_mcl.raw and pf_mcl.curated in the way you want. ifelse() may be useful for that. Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Oct 13, 2017 at 10:48 PM, Kevin Wamae <KWamae at kemri-wellcome.org> wrote:> Dear @Bert Gunter<mailto:bgunter.4567 at gmail.com>, I tried merge and I > faced many challenges. @Rui Barradas<mailto:ruipbarradas at sapo.pt> > solution is working. > > From: Bert Gunter <bgunter.4567 at gmail.com> > Date: Friday, 13 October 2017 at 22:44 > To: Kevin Wamae <KWamae at kemri-wellcome.org> > Cc: R-help <R-help at r-project.org> > Subject: Re: [R] Populate one data frame with values from another > dataframe for rows that match > > ?merge > > Bert > > On Oct 13, 2017 12:09 PM, "Kevin Wamae" <KWamae at kemri-wellcome.org<mailto: > KWamae at kemri-wellcome.org>> wrote: > I'm trying to populate the column ?pf_mcl? in myDF1 with values from > myDF2, where rows match based on column "studyno" but the solutions I have > found so far don't seem to be giving me the desired output. > > Below is a snapshot of the data.frames. > > myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", > "J1000/9", > "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, > 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_ > ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", > "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame") > > myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", > "J931/6", > "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names > c("studyno", > "pf_mcl"), row.names = c(NA, 6L), class = "data.frame") > > myDF2 is a well curated subset of myDF1. Some rows in the two datasets > match based on "studyno", one may find that values are missing in > myDF1$pf_mcl or the values are wrong. > > All I want to do is identify a matching row in myDF2 and populate > myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based > on ?studyno?, the value should remain the same. > > It's probably worth mentioning, the two data frames have other columns...I > have selected a few for example purposes. > > > > ______________________________________________________________________ > > This e-mail contains information which is confidential. It is intended > only for the use of the named recipient. If you have received this e-mail > in error, please let us know by replying to the sender, and immediately > delete it from your system. Please note, that in these circumstances, the > use, disclosure, distribution or copying of this information is strictly > prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility > for the accuracy or completeness of this message as it has been > transmitted over a public network. Although the Programme has taken > reasonable precautions to ensure no viruses are present in emails, it > cannot accept responsibility for any loss or damage arising from the use of > the email or attachments. Any views expressed in this message are those of > the individual sender, except where the sender specifically states them to > be the views of KEMRI-Wellcome Trust Programme. > ______________________________________________________________________ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________________________________ > > This e-mail contains information which is confidential. It is intended > only for the use of the named recipient. If you have received this e-mail > in error, please let us know by replying to the sender, and immediately > delete it from your system. Please note, that in these circumstances, the > use, disclosure, distribution or copying of this information is strictly > prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility > for the accuracy or completeness of this message as it has been > transmitted over a public network. Although the Programme has taken > reasonable precautions to ensure no viruses are present in emails, it > cannot accept responsibility for any loss or damage arising from the use of > the email or attachments. Any views expressed in this message are those of > the individual sender, except where the sender specifically states them to > be the views of KEMRI-Wellcome Trust Programme. > ______________________________________________________________________ > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Kevin Wamae
2017-Oct-15 11:03 UTC
[R] Populate one data frame with values from another dataframe for rows that match
Dear @William<mailto:wdunlap at tibco.com>, thanks for the feedback. I have tested it on the larger dataset and noticed that it created two variables, pf_raw and pf_curated. The output we were looking for, was one that takes the variable pf_mcl in curated dataset and replaces pf_mcl in matching rows within the raw dataset. @Eric<mailto:ericjberger at gmail.com>?s solution was able to achieve that. Nonetheless, we do appreciate your solution. Regards ------------------ Kevin Wamae From: William Dunlap <wdunlap at tibco.com> Date: Saturday, 14 October 2017 at 20:21 To: Kevin Wamae <KWamae at kemri-wellcome.org> Cc: Bert Gunter <bgunter.4567 at gmail.com>, Rui Barradas <ruipbarradas at sapo.pt>, R-help <R-help at r-project.org> Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match Your example used one distinct studyno in DF1 and one distinct pf_mcl in DF2. I think that makes it hard to see what is going on, but maybe I completely misunderstand the problem. In any case, let's redefine myDF1 and myDF2. Note that myDF1 contains a studyno not in myDF2 and vice versa. myDF1 <- structure(list(studyno = c("J1000/9", "J895/7", "J931/6", "J666/6", "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, 2L, 3L, 4L, 5L, NA_integer_ ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame") myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6", "J609/1", "J941/3"), pf_mcl = c(101L, 102L, 103L, 104L, 105L, 106L)), .Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class = "data.frame") m <- merge(myDF1, myDF2, by="studyno", all.x=TRUE, all.y=FALSE, suffixes=c(".raw", ".curated")) The results are:> myDF1studyno date pf_mcl year 1 J1000/9 2016-11-18 NA 2016 2 J895/7 2016-11-22 2 2016 3 J931/6 2016-11-30 3 2016 4 J666/6 2016-12-09 4 2016 5 J1000/9 2016-12-13 5 2016 6 J1000/9 2016-12-20 NA 2016> myDF2studyno pf_mcl 1 J740/4 101 2 J1000/9 102 3 J895/7 103 4 J931/6 104 5 J609/1 105 6 J941/3 106> mstudyno date pf_mcl.raw year pf_mcl.curated 1 J1000/9 2016-11-18 NA 2016 102 2 J1000/9 2016-12-13 5 2016 102 3 J1000/9 2016-12-20 NA 2016 102 4 J666/6 2016-12-09 4 2016 NA 5 J895/7 2016-11-22 2 2016 103 6 J931/6 2016-11-30 3 2016 104 Now your problem is to combine the columns pf_mcl.raw and pf_mcl.curated in the way you want. ifelse() may be useful for that. Bill Dunlap TIBCO Software wdunlap tibco.com<http://tibco.com> On Fri, Oct 13, 2017 at 10:48 PM, Kevin Wamae <KWamae at kemri-wellcome.org<mailto:KWamae at kemri-wellcome.org>> wrote: Dear @Bert Gunter<mailto:bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>>, I tried merge and I faced many challenges. @Rui Barradas<mailto:ruipbarradas at sapo.pt<mailto:ruipbarradas at sapo.pt>> solution is working. From: Bert Gunter <bgunter.4567 at gmail.com<mailto:bgunter.4567 at gmail.com>> Date: Friday, 13 October 2017 at 22:44 To: Kevin Wamae <KWamae at kemri-wellcome.org<mailto:KWamae at kemri-wellcome.org>> Cc: R-help <R-help at r-project.org<mailto:R-help at r-project.org>> Subject: Re: [R] Populate one data frame with values from another dataframe for rows that match ?merge Bert On Oct 13, 2017 12:09 PM, "Kevin Wamae" <KWamae at kemri-wellcome.org<mailto:KWamae at kemri-wellcome.org><mailto:KWamae at kemri-wellcome.org<mailto:KWamae at kemri-wellcome.org>>> wrote: I'm trying to populate the column ?pf_mcl? in myDF1 with values from myDF2, where rows match based on column "studyno" but the solutions I have found so far don't seem to be giving me the desired output. Below is a snapshot of the data.frames. myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9", "J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, 17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_ ), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", "date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame") myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6", "J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno", "pf_mcl"), row.names = c(NA, 6L), class = "data.frame") myDF2 is a well curated subset of myDF1. Some rows in the two datasets match based on "studyno", one may find that values are missing in myDF1$pf_mcl or the values are wrong. All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match based on ?studyno?, the value should remain the same. It's probably worth mentioning, the two data frames have other columns...I have selected a few for example purposes. ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org><mailto:R-help at r-project.org<mailto:R-help at r-project.org>> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________________________________ This e-mail contains information which is confidential. It is intended only for the use of the named recipient. If you have received this e-mail in error, please let us know by replying to the sender, and immediately delete it from your system. Please note, that in these circumstances, the use, disclosure, distribution or copying of this information is strictly prohibited. KEMRI-Wellcome Trust Programme cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. Although the Programme has taken reasonable precautions to ensure no viruses are present in emails, it cannot accept responsibility for any loss or damage arising from the use of the email or attachments. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of KEMRI-Wellcome Trust Programme. ______________________________________________________________________ [[alternative HTML version deleted]]
Maybe Matching Threads
- Populate one data frame with values from another dataframe for rows that match
- Populate one data frame with values from another dataframe for rows that match
- Populate one data frame with values from another dataframe for rows that match
- Populate one data frame with values from another dataframe for rows that match
- Populate one data frame with values from another dataframe for rows that match