Dear useRs,
I would like to improve my ugly (though working) code, but I think I
need a completely different approach and I just can't think out of my box!
I have some external information about which sample(s) belong to which
experiment. I need to get that manually into R (either typing directly
in a script or read a CSV file, but that makes no difference):
exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 =
c("sample2-1",
"sample2-2" , "sample2-3"))
Then I have my data, only with the sample IDs:
mydata <- data.frame(sample = c("sample2-2", "sample2-3",
"sample1-1",
"sample1-1", "sample1-1", "sample2-1"))
Now I want to add a column to mydata with the experiment ID. The best I
could find is that:
for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]],
"experiment"] <- i
In this example, the experiment ID could be extracted from the sample
IDs, but this is not the case with my real data so it really is a matter
of matching. Of course I also have other columns with my real data.
I'm pretty sure the last line (with the loop) can be improved in terms
of readability (speed is not an issue here). I have close to no
constraints on 'exp' (here I chose a list, but anything could do), the
only thing that cannot change is the format of 'mydata'.
Thank you in advance!
Ivan
--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra
I think what you're looking for is match. It returns the indexes of the output where the inputs can be matched, and has a nomatch argument incase no match is found, usually people would use NA or 0 for nomatch. On Thu, Mar 10, 2022, 10:51 Ivan Calandra <ivan.calandra at rgzm.de> wrote:> Dear useRs, > > I would like to improve my ugly (though working) code, but I think I > need a completely different approach and I just can't think out of my box! > > I have some external information about which sample(s) belong to which > experiment. I need to get that manually into R (either typing directly > in a script or read a CSV file, but that makes no difference): > exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", > "sample2-2" , "sample2-3")) > > Then I have my data, only with the sample IDs: > mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", > "sample1-1", "sample1-1", "sample2-1")) > > Now I want to add a column to mydata with the experiment ID. The best I > could find is that: > for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], > "experiment"] <- i > > In this example, the experiment ID could be extracted from the sample > IDs, but this is not the case with my real data so it really is a matter > of matching. Of course I also have other columns with my real data. > > I'm pretty sure the last line (with the loop) can be improved in terms > of readability (speed is not an issue here). I have close to no > constraints on 'exp' (here I chose a list, but anything could do), the > only thing that cannot change is the format of 'mydata'. > > Thank you in advance! > Ivan > > -- > > Dr. Ivan Calandra > Imaging lab > RGZM - MONREPOS Archaeological Research Centre > Schloss Monrepos > 56567 Neuwied, Germany > +49 (0) 2631 9772-243 > https://www.researchgate.net/profile/Ivan_Calandra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Use merge.
expts <- read.csv( text "expt,sample
ex1,sample1-1
ex1,sample1-2
ex2,sample2-1
ex2,sample2-2
ex2,sample2-3
", header=TRUE, as.is=TRUE )
mydata <- data.frame(sample = c("sample2-2", "sample2-3",
"sample1-1", "sample1-1", "sample1-1",
"sample2-1"))
merge( mydata, expts, by="sample", all.x=TRUE )
On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra at rgzm.de>
wrote:>Dear useRs,
>
>I would like to improve my ugly (though working) code, but I think I
>need a completely different approach and I just can't think out of my
box!
>
>I have some external information about which sample(s) belong to which
>experiment. I need to get that manually into R (either typing directly
>in a script or read a CSV file, but that makes no difference):
>exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 =
c("sample2-1",
>"sample2-2" , "sample2-3"))
>
>Then I have my data, only with the sample IDs:
>mydata <- data.frame(sample = c("sample2-2",
"sample2-3", "sample1-1",
>"sample1-1", "sample1-1", "sample2-1"))
>
>Now I want to add a column to mydata with the experiment ID. The best I
>could find is that:
>for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]],
>"experiment"] <- i
>
>In this example, the experiment ID could be extracted from the sample
>IDs, but this is not the case with my real data so it really is a matter
>of matching. Of course I also have other columns with my real data.
>
>I'm pretty sure the last line (with the loop) can be improved in terms
>of readability (speed is not an issue here). I have close to no
>constraints on 'exp' (here I chose a list, but anything could do),
the
>only thing that cannot change is the format of 'mydata'.
>
>Thank you in advance!
>Ivan
>
--
Sent from my phone. Please excuse my brevity.