thr3ads.net - R help - [R] Ids with matching number combinations? [Oct 2022]

If this information is useful, please help other people find it:
Share via:

Marine Andersson

2022-Oct-07 11:57 UTC

[R] Ids with matching number combinations?

Hi,

If I have two datasets like this:
df=data.frame("id"=rep(1:10,10, each=10),
"item1"=sample(1:20, 100, replace=T)
df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1), "b"=c(16,18,11,
19,18, 11,17,12))

How do I find out which ids in the df dataset that has a match for both the
numbers occuring in the same row in the df2 dataframe? In the output I would
like to get the matching id and the rownumber from the df2.

Output something like this
Id                        Rownr
2                         1
5                         1
7                         4

My actual problem is more complex with even more columns to be matched and the
datasets are large, hence the solution needs to be efficient.

Kind regards,





N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI
kommer att behandla dina personuppgifter. H?r finns information om hur KI
behandlar
personuppgifter<https://ki.se/medarbetare/integritetsskyddspolicy>.


Sending email to Karolinska Institutet (KI) will result in KI processing your
personal data. You can read more about KI's processing of personal data
here<https://ki.se/en/staff/data-protection-policy>.

	[[alternative HTML version deleted]]

PIKAL Petr

2022-Oct-07 14:02 UTC

head link

[R] Ids with matching number combinations?

Hallo Marine

Could you please make your example more reproducible by using set.seed (and
maybe smaller)?

If I understand correctly, you want to know if let say row 1 items from df2
(8,16) are both in item column of specific id?

If I am correct in guessing, I cannot find another solution than split your
df according to id
x <- split(df, df$id)[[1]]

and for each row of df2 test if within the specified id you can find both
numbers.
sum(is.element(df2[1,], x$item))==2
[1] FALSE

So basically 2 cycles, one for df ids and the other for df2 rows.

But maybe somebody will give you more ingenious answer.

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Marine
Andersson
> Sent: Friday, October 7, 2022 1:58 PM
> To: r-help at r-project.org
> Subject: [R] Ids with matching number combinations?
> 
> Hi,
> 
> If I have two datasets like this:
> df=data.frame("id"=rep(1:10,10, each=10),
"item1"=sample(1:20, 100,
> replace=T)
> df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1),
"b"=c(16,18,11, 19,18,
11,17,12))> 
> How do I find out which ids in the df dataset that has a match for both
the> numbers occuring in the same row in the df2 dataframe? In the output I
would> like to get the matching id and the rownumber from the df2.
> 
> Output something like this
> Id                        Rownr
> 2                         1
> 5                         1
> 7                         4
> 
> My actual problem is more complex with even more columns to be matched and
> the datasets are large, hence the solution needs to be efficient.
> 
> Kind regards,
> 
> 
> 
> 
> 
> N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI
kommer> att behandla dina personuppgifter. H?r finns information om hur KI
behandlar> personuppgifter<https://ki.se/medarbetare/integritetsskyddspolicy>.
> 
> 
> Sending email to Karolinska Institutet (KI) will result in KI processing
your> personal data. You can read more about KI's processing of personal data
> here<https://ki.se/en/staff/data-protection-policy>.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Bert Gunter

2022-Oct-07 16:19 UTC

head link

[R] Ids with matching number combinations?

Well, for a start, you might give us a reproducible example that actually
runs -- yours doesn't. Did you check? You seem to b.e missing a final
")"(Also, you do not need to quote the column names in data.frame(),
though
it works fine also if you do).

Also note that in df, your id column has length 1000 and item1 column has
length 100 and will be replicated to match id. This seems to me likely to
be misspecified. Or is this what you meant?

Finally, note that that none of the rows in df2 have identical numbers, so
there is no id in df that can match both members of the row. So did you
mean that the 'id' and 'item1' value in a row of df must match
the
corresponding 'a' and 'b' values of some row in df2 ?

Under the above interpretation and with the following reprex(note the use
of set.seed() to make it reproducible)...

set.seed(1234)  ## for reproducibility
df=data.frame(id=c(10,rep(1:10, each=10), item1=sample(1:20, 100,
replace=T))
df2=data.frame(a=c(8, 8,10,9, 5, 1,2,1), b=c(16,18,11, 19,18, 11,17,12))

... you can paste0() the column vectors together in each data frame(as
character values) and then just match on the single character vectors, like
this:

both <- match(do.call(paste0,df),  do.call(paste0, df2))
## subscript the data frames if you have more columns that are not used for
matching
> both  [1] NA NA  8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [21] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [41] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [61] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
 [81] NA NA NA NA NA NA NA NA NA NA  3 NA NA NA NA NA NA NA NA  3

Then> both[!is.na(both)] ## the rows of df2 that matched[1] 8 3 3
> which(!is.na(both)) ## the rows of df that were matched[1]   3  91 100

This should be very efficient, as hashing is used for matching.

I think there is a slicker way to do this that combines the paste and match
functions together (other than using merge() or the like, as has already
been suggested). But I have forgotten the details or I may just be thinking
of merge(), which may indeed be a better option.

All presuming this is what you meant, of course, which it may not be. :-(

Cheers,
Bert

On Fri, Oct 7, 2022 at 5:57 AM Marine Andersson <marine.andersson at
ki.se>
wrote:
> Hi,
>
> If I have two datasets like this:
> df=data.frame("id"=rep(1:10,10, each=10),
"item1"=sample(1:20, 100,
> replace=T)
> df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1),
"b"=c(16,18,11, 19,18,
> 11,17,12))
>
> How do I find out which ids in the df dataset that has a match for both
> the numbers occuring in the same row in the df2 dataframe? In the output I
> would like to get the matching id and the rownumber from the df2.
>
> Output something like this
> Id                        Rownr
> 2                         1
> 5                         1
> 7                         4
>
> My actual problem is more complex with even more columns to be matched and
> the datasets are large, hence the solution needs to be efficient.
>
> Kind regards,
>
>
>
>
>
> N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI
> kommer att behandla dina personuppgifter. H?r finns information om hur KI
> behandlar personuppgifter<
> https://ki.se/medarbetare/integritetsskyddspolicy>.
>
>
> Sending email to Karolinska Institutet (KI) will result in KI processing
> your personal data. You can read more about KI's processing of personal
> data here<https://ki.se/en/staff/data-protection-policy>.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jim Lemon

2022-Oct-08 10:58 UTC

head link

[R] Ids with matching number combinations?

Hi Marine,
I'm not sure that I have the right idea, but this worked for me and may
help.

# I had to fix this line
df<-data.frame(id=rep(1:10,each=10),item1=sample(1:20, 100, replace=TRUE))
df2<-data.frame(a=c(8, 8,10,9, 5, 1,2,1),b=c(16,18,11, 19,18, 11,17,12))
result<-data.frame(id=df[,1],rown=1:dim(df)[1],matched=rep(FALSE,dim(dfw)[1]))
# function that checks that all entries in vector a exist in vector x
match_all_a<-function(a,x,index) {
 inx<-TRUE
 for(i in 1:length(a)) inx <- inx && a[i] %in% x[-1]
 return(list(id=x[1],rown=index,matched=inx))
}
# step through the data frame
for(i in df[,1]) result[i,]<-match_all_a(df2[i,],df[i,],i)
# display the result id, row in df, whether matched
result
# select the rows of result for which matched is TRUE

Jim

On Fri, Oct 7, 2022 at 11:57 PM Marine Andersson <marine.andersson at
ki.se> wrote:>
> Hi,
>
> If I have two datasets like this:
> df=data.frame("id"=rep(1:10,10, each=10),
"item1"=sample(1:20, 100, replace=T)
> df2=data.frame("a"=c(8, 8,10,9, 5, 1,2,1),
"b"=c(16,18,11, 19,18, 11,17,12))
>
> How do I find out which ids in the df dataset that has a match for both the
numbers occuring in the same row in the df2 dataframe? In the output I would
like to get the matching id and the rownumber from the df2.
>
> Output something like this
> Id                        Rownr
> 2                         1
> 5                         1
> 7                         4
>
> My actual problem is more complex with even more columns to be matched and
the datasets are large, hence the solution needs to be efficient.
>
> Kind regards,
>
>
>
>
>
> N?r du skickar e-post till Karolinska Institutet (KI) inneb?r detta att KI
kommer att behandla dina personuppgifter. H?r finns information om hur KI
behandlar
personuppgifter<https://ki.se/medarbetare/integritetsskyddspolicy>.
>
>
> Sending email to Karolinska Institutet (KI) will result in KI processing
your personal data. You can read more about KI's processing of personal data
here<https://ki.se/en/staff/data-protection-policy>.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Oct 2022 - Ids with matching number combinations?

[R] Ids with matching number combinations?

[R] Ids with matching number combinations?

[R] Ids with matching number combinations?

[R] Ids with matching number combinations?