Ebert,Timothy Aaron
2022-Mar-10 17:58 UTC
[R] conditional filling of data.frame - improve code
You could try some of the "join" commands from dplyr. https://dplyr.tidyverse.org/reference/mutate-joins.html https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti Regards, Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller Sent: Thursday, March 10, 2022 11:25 AM To: r-help at r-project.org; Ivan Calandra <ivan.calandra at rgzm.de>; R-help <r-help at r-project.org> Subject: Re: [R] conditional filling of data.frame - improve code [External Email] Use merge. expts <- read.csv( text "expt,sample ex1,sample1-1 ex1,sample1-2 ex2,sample2-1 ex2,sample2-2 ex2,sample2-3 ", header=TRUE, as.is=TRUE ) mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", "sample1-1", "sample1-1", "sample2-1")) merge( mydata, expts, by="sample", all.x=TRUE ) On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra at rgzm.de> wrote:>Dear useRs, > >I would like to improve my ugly (though working) code, but I think I >need a completely different approach and I just can't think out of my box! > >I have some external information about which sample(s) belong to which >experiment. I need to get that manually into R (either typing directly >in a script or read a CSV file, but that makes no difference): >exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", >"sample2-2" , "sample2-3")) > >Then I have my data, only with the sample IDs: >mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", >"sample1-1", "sample1-1", "sample2-1")) > >Now I want to add a column to mydata with the experiment ID. The best I >could find is that: >for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], >"experiment"] <- i > >In this example, the experiment ID could be extracted from the sample >IDs, but this is not the case with my real data so it really is a >matter of matching. Of course I also have other columns with my real data. > >I'm pretty sure the last line (with the loop) can be improved in terms >of readability (speed is not an issue here). I have close to no >constraints on 'exp' (here I chose a list, but anything could do), the >only thing that cannot change is the format of 'mydata'. > >Thank you in advance! >Ivan >-- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&eand provide commented, minimal, self-contained, reproducible code.
Thank you Jeff and Tim for your ideas. Indeed merge/join is probably the nicest way. Still, the code becomes much longer because I need more formatting of the input and output objects than with my ugly for loop :) Cheers, Ivan -- Dr. Ivan Calandra Imaging lab RGZM - MONREPOS Archaeological Research Centre Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra Le 10/03/2022 ? 18:58, Ebert,Timothy Aaron a ?crit?:> You could try some of the "join" commands from dplyr. > https://dplyr.tidyverse.org/reference/mutate-joins.html > https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti > > > Regards, > Tim > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller > Sent: Thursday, March 10, 2022 11:25 AM > To: r-help at r-project.org; Ivan Calandra <ivan.calandra at rgzm.de>; R-help <r-help at r-project.org> > Subject: Re: [R] conditional filling of data.frame - improve code > > [External Email] > > Use merge. > > expts <- read.csv( text > "expt,sample > ex1,sample1-1 > ex1,sample1-2 > ex2,sample2-1 > ex2,sample2-2 > ex2,sample2-3 > ", header=TRUE, as.is=TRUE ) > > mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", "sample1-1", "sample1-1", "sample2-1")) > > merge( mydata, expts, by="sample", all.x=TRUE ) > > > On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra at rgzm.de> wrote: >> Dear useRs, >> >> I would like to improve my ugly (though working) code, but I think I >> need a completely different approach and I just can't think out of my box! >> >> I have some external information about which sample(s) belong to which >> experiment. I need to get that manually into R (either typing directly >> in a script or read a CSV file, but that makes no difference): >> exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", >> "sample2-2" , "sample2-3")) >> >> Then I have my data, only with the sample IDs: >> mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", >> "sample1-1", "sample1-1", "sample2-1")) >> >> Now I want to add a column to mydata with the experiment ID. The best I >> could find is that: >> for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], >> "experiment"] <- i >> >> In this example, the experiment ID could be extracted from the sample >> IDs, but this is not the case with my real data so it really is a >> matter of matching. Of course I also have other columns with my real data. >> >> I'm pretty sure the last line (with the loop) can be improved in terms >> of readability (speed is not an issue here). I have close to no >> constraints on 'exp' (here I chose a list, but anything could do), the >> only thing that cannot change is the format of 'mydata'. >> >> Thank you in advance! >> Ivan >> > -- > Sent from my phone. Please excuse my brevity. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e> PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e> and provide commented, minimal, self-contained, reproducible code.