thr3ads.net - R help - [R] conditional filling of data.frame

If this information is useful, please help other people find it:
Share via:

Ivan Calandra

2022-Mar-11 08:48 UTC

[R] conditional filling of data.frame - improve code

In my first trials, I made a typo, which resulted in more columns than 
needed in the output of merge, which is why I needed more formatting. 
But now, it is indeed done all in one line and it is, as I said already, 
nicer anyway!

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Le 11/03/2022 ? 08:47, Jeff Newmiller a ?crit?:> What a strange objection. You wouldn't keep the inline definition of
expts in working code... that would be in a reference data file, and the merge
is one line.
>
> On March 10, 2022 11:24:27 PM PST, Ivan Calandra <ivan.calandra at
rgzm.de> wrote:
>> Thank you Jeff and Tim for your ideas. Indeed merge/join is probably
the
>> nicest way. Still, the code becomes much longer because I need more
>> formatting of the input and output objects than with my ugly for loop
:)
>>
>> Cheers,
>> Ivan
>>
>> --
>> Dr. Ivan Calandra
>> Imaging lab
>> RGZM - MONREPOS Archaeological Research Centre
>> Schloss Monrepos
>> 56567 Neuwied, Germany
>> +49 (0) 2631 9772-243
>> https://www.researchgate.net/profile/Ivan_Calandra
>>
>> Le 10/03/2022 ? 18:58, Ebert,Timothy Aaron a ?crit?:
>>> You could try some of the "join" commands from dplyr.
>>> https://dplyr.tidyverse.org/reference/mutate-joins.html
>>>
https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti
>>>
>>>
>>> Regards,
>>> Tim
>>> -----Original Message-----
>>> From: R-help <r-help-bounces at r-project.org> On Behalf Of
Jeff Newmiller
>>> Sent: Thursday, March 10, 2022 11:25 AM
>>> To: r-help at r-project.org; Ivan Calandra <ivan.calandra at
rgzm.de>; R-help <r-help at r-project.org>
>>> Subject: Re: [R] conditional filling of data.frame - improve code
>>>
>>> [External Email]
>>>
>>> Use merge.
>>>
>>> expts <- read.csv( text >>> "expt,sample
>>> ex1,sample1-1
>>> ex1,sample1-2
>>> ex2,sample2-1
>>> ex2,sample2-2
>>> ex2,sample2-3
>>> ", header=TRUE, as.is=TRUE )
>>>
>>> mydata <- data.frame(sample = c("sample2-2",
"sample2-3", "sample1-1", "sample1-1",
"sample1-1", "sample2-1"))
>>>
>>> merge( mydata, expts, by="sample", all.x=TRUE )
>>>
>>>
>>> On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra
at rgzm.de> wrote:
>>>> Dear useRs,
>>>>
>>>> I would like to improve my ugly (though working) code, but I
think I
>>>> need a completely different approach and I just can't think
out of my box!
>>>>
>>>> I have some external information about which sample(s) belong
to which
>>>> experiment. I need to get that manually into R (either typing
directly
>>>> in a script or read a CSV file, but that makes no difference):
>>>> exp <- list(ex1 = c("sample1-1",
"sample1-2"), ex2 = c("sample2-1",
>>>> "sample2-2" , "sample2-3"))
>>>>
>>>> Then I have my data, only with the sample IDs:
>>>> mydata <- data.frame(sample = c("sample2-2",
"sample2-3", "sample1-1",
>>>> "sample1-1", "sample1-1",
"sample2-1"))
>>>>
>>>> Now I want to add a column to mydata with the experiment ID.
The best I
>>>> could find is that:
>>>> for (i in names(exp)) mydata[mydata[["sample"]] %in%
exp[[i]],
>>>> "experiment"] <- i
>>>>
>>>> In this example, the experiment ID could be extracted from the
sample
>>>> IDs, but this is not the case with my real data so it really is
a
>>>> matter of matching. Of course I also have other columns with my
real data.
>>>>
>>>> I'm pretty sure the last line (with the loop) can be
improved in terms
>>>> of readability (speed is not an issue here). I have close to no
>>>> constraints on 'exp' (here I chose a list, but anything
could do), the
>>>> only thing that cannot change is the format of
'mydata'.
>>>>
>>>> Thank you in advance!
>>>> Ivan
>>>>
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e>>>
PLEASE do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e>>>
and provide commented, minimal, self-contained, reproducible code.
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2022-Mar-11 09:14 UTC

head link

[R] conditional filling of data.frame - improve code

Heello,

I hadn't posted an answer because my mapply is more complicated that the 
original and much more complicated than Jeff's merge but here it is. But 
if there's a problem with the output of merge, maybe the mapply can be 
of use, only the column expressly named is created.
The result is equal to the original.
I have changed the name exp to exp1.

mydata <- data.frame(sample = c("sample2-2", "sample2-3",
"sample1-1",
"sample1-1", "sample1-1", "sample2-1"))
exp1 <- list(ex1 = c("sample1-1", "sample1-2"), ex2 =
c("sample2-1",
"sample2-2" , "sample2-3"))

for(i in names(exp1)) {
   mydata[mydata[["sample"]] %in% exp1[[i]], "experiment"]
<- i
}

# must create the new column beforehand
mydata[["experiment2"]] <- NA_character_
mapply(\(value, name, s){
   i <- which(s %in% value)
   mydata[["experiment2"]][i] <<- name
}, exp1, names(exp1), MoreArgs = list(s = mydata$sample))

mydata
#     sample experiment experiment2
#1 sample2-2        ex2         ex2
#2 sample2-3        ex2         ex2
#3 sample1-1        ex1         ex1
#4 sample1-1        ex1         ex1
#5 sample1-1        ex1         ex1
#6 sample2-1        ex2         ex2


Hope this helps,

Rui Barradas

?s 08:48 de 11/03/2022, Ivan Calandra escreveu:> In my first trials, I made a typo, which resulted in more columns than 
> needed in the output of merge, which is why I needed more formatting. 
> But now, it is indeed done all in one line and it is, as I said already, 
> nicer anyway!
> 
> -- 
> Dr. Ivan Calandra
> Imaging lab
> RGZM - MONREPOS Archaeological Research Centre
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772-243
> https://www.researchgate.net/profile/Ivan_Calandra
> 
> Le 11/03/2022 ? 08:47, Jeff Newmiller a ?crit?:
>> What a strange objection. You wouldn't keep the inline definition
of
>> expts in working code... that would be in a reference data file, and 
>> the merge is one line.
>>
>> On March 10, 2022 11:24:27 PM PST, Ivan Calandra 
>> <ivan.calandra at rgzm.de> wrote:
>>> Thank you Jeff and Tim for your ideas. Indeed merge/join is
probably the
>>> nicest way. Still, the code becomes much longer because I need more
>>> formatting of the input and output objects than with my ugly for
loop :)
>>>
>>> Cheers,
>>> Ivan
>>>
>>> -- 
>>> Dr. Ivan Calandra
>>> Imaging lab
>>> RGZM - MONREPOS Archaeological Research Centre
>>> Schloss Monrepos
>>> 56567 Neuwied, Germany
>>> +49 (0) 2631 9772-243
>>> https://www.researchgate.net/profile/Ivan_Calandra
>>>
>>> Le 10/03/2022 ? 18:58, Ebert,Timothy Aaron a ?crit?:
>>>> You could try some of the "join" commands from dplyr.
>>>> https://dplyr.tidyverse.org/reference/mutate-joins.html
>>>>
https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Tim
>>>> -----Original Message-----
>>>> From: R-help <r-help-bounces at r-project.org> On Behalf
Of Jeff Newmiller
>>>> Sent: Thursday, March 10, 2022 11:25 AM
>>>> To: r-help at r-project.org; Ivan Calandra <ivan.calandra at
rgzm.de>;
>>>> R-help <r-help at r-project.org>
>>>> Subject: Re: [R] conditional filling of data.frame - improve
code
>>>>
>>>> [External Email]
>>>>
>>>> Use merge.
>>>>
>>>> expts <- read.csv( text >>>> "expt,sample
>>>> ex1,sample1-1
>>>> ex1,sample1-2
>>>> ex2,sample2-1
>>>> ex2,sample2-2
>>>> ex2,sample2-3
>>>> ", header=TRUE, as.is=TRUE )
>>>>
>>>> mydata <- data.frame(sample = c("sample2-2",
"sample2-3",
>>>> "sample1-1", "sample1-1",
"sample1-1", "sample2-1"))
>>>>
>>>> merge( mydata, expts, by="sample", all.x=TRUE )
>>>>
>>>>
>>>> On March 10, 2022 7:50:23 AM PST, Ivan Calandra 
>>>> <ivan.calandra at rgzm.de> wrote:
>>>>> Dear useRs,
>>>>>
>>>>> I would like to improve my ugly (though working) code, but
I think I
>>>>> need a completely different approach and I just can't
think out of
>>>>> my box!
>>>>>
>>>>> I have some external information about which sample(s)
belong to which
>>>>> experiment. I need to get that manually into R (either
typing directly
>>>>> in a script or read a CSV file, but that makes no
difference):
>>>>> exp <- list(ex1 = c("sample1-1",
"sample1-2"), ex2 = c("sample2-1",
>>>>> "sample2-2" , "sample2-3"))
>>>>>
>>>>> Then I have my data, only with the sample IDs:
>>>>> mydata <- data.frame(sample = c("sample2-2",
"sample2-3", "sample1-1",
>>>>> "sample1-1", "sample1-1",
"sample2-1"))
>>>>>
>>>>> Now I want to add a column to mydata with the experiment
ID. The
>>>>> best I
>>>>> could find is that:
>>>>> for (i in names(exp)) mydata[mydata[["sample"]]
%in% exp[[i]],
>>>>> "experiment"] <- i
>>>>>
>>>>> In this example, the experiment ID could be extracted from
the sample
>>>>> IDs, but this is not the case with my real data so it
really is a
>>>>> matter of matching. Of course I also have other columns
with my
>>>>> real data.
>>>>>
>>>>> I'm pretty sure the last line (with the loop) can be
improved in terms
>>>>> of readability (speed is not an issue here). I have close
to no
>>>>> constraints on 'exp' (here I chose a list, but
anything could do), the
>>>>> only thing that cannot change is the format of
'mydata'.
>>>>>
>>>>> Thank you in advance!
>>>>> Ivan
>>>>>
>>>> -- 
>>>> Sent from my phone. Please excuse my brevity.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e=
>>>>
>>>> PLEASE do read the posting guide 
>>>>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e=
>>>>
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Mar 2022 - conditional filling of data.frame - improve code

[R] conditional filling of data.frame - improve code

[R] conditional filling of data.frame - improve code