Hi folks, Can anyone suggest an efficient way to do "matching without replacement", or "one-to-one matching"? pmatch() doesn't quite provide what I need... For example, lookupTable <- c("a","b","c","d","e","f") matchSample <- c("a","a","b","d") ##Normal match() behaviour: match(matchSample,lookupTable) [1] 1 1 2 4 My problem here is that both "a"s in matchSample are matched to the same "a" in the lookup table. I need the elements of the lookup table to be excluded from the table as they are matched, so that no match can be found for the second "a". Function pmatch() comes close to what I need: pmatch(matchSample,lookupTable) [1] 1 NA 2 4 Yep! However, pmatch() incorporates partial matching, which I definitely don't want: lookupTable <- c("a","b","c","d","e","aaaaaaaaf") matchSample <- c("a","a","b","d") pmatch(matchSample,lookupTable) [1] 1 6 2 4 ## i.e. the second "a", matches "aaaaaaaaf" - I don't want this. Of course, when identical items ARE duplicated in both sample and lookup table, I need the matching to reflect this: lookupTable <- c("a","a","c","d","e","f") matchSample <- c("a","a","c","d") ##Normal match() behaviour match(matchSample,lookupTable) [1] 1 1 3 4 No good - pmatch() is better: lookupTable <- c("a","a","c","d","e","f") matchSample <- c("a","a","c","d") pmatch(matchSample,lookupTable) [1] 1 2 3 4 ...but we still have the partial matching issue... ##And of course, as per the usual behaviour of match(), sample elements missing from the lookup table should return NA: matchSample <- c("a","frog","e","d") ; print(matchSample) match(matchSample,lookupTable) Is there a nifty way to get what I'm after without resorting to a for loop? (my code's already got too blasted many of those...) Thanks, Alec Zwart CMIS CSIRO alec.zwart at csiro.au
Try this. apseq() sorts the input and appends a sequence number: 0, 1, ... to successive occurrences of each value. Apply that to both vectors transforms it into a problem that works with ordinary match:> lookupTable <- c("a", "a","b","c","d","e","f") > matchSample <- c("a", "a","a","b","d") > > # sort and append sequence no > apseq <- function(x) {+ x <- sort(x) + s <- cumsum(!duplicated(x)) + paste(x, seq(s) - match(s, s)) + }> > match(apseq(matchSample), apseq(lookupTable))[1] 1 2 NA 3 5 On Sun, Jun 22, 2008 at 10:57 PM, <Alec.Zwart at csiro.au> wrote:> Hi folks, > > Can anyone suggest an efficient way to do "matching without > replacement", or "one-to-one matching"? pmatch() doesn't quite provide > what I need... > > For example, > > lookupTable <- c("a","b","c","d","e","f") > matchSample <- c("a","a","b","d") > ##Normal match() behaviour: > match(matchSample,lookupTable) > [1] 1 1 2 4 > > My problem here is that both "a"s in matchSample are matched to the same > "a" in the lookup table. I need the elements of the lookup table to be > excluded from the table as they are matched, so that no match can be > found for the second "a". > > Function pmatch() comes close to what I need: > > pmatch(matchSample,lookupTable) > [1] 1 NA 2 4 > > Yep! However, pmatch() incorporates partial matching, which I > definitely don't want: > > lookupTable <- c("a","b","c","d","e","aaaaaaaaf") > matchSample <- c("a","a","b","d") > pmatch(matchSample,lookupTable) > [1] 1 6 2 4 > ## i.e. the second "a", matches "aaaaaaaaf" - I don't want this. > > Of course, when identical items ARE duplicated in both sample and lookup > table, I need the matching to reflect this: > > lookupTable <- c("a","a","c","d","e","f") > matchSample <- c("a","a","c","d") > ##Normal match() behaviour > match(matchSample,lookupTable) > [1] 1 1 3 4 > > No good - pmatch() is better: > > lookupTable <- c("a","a","c","d","e","f") > matchSample <- c("a","a","c","d") > pmatch(matchSample,lookupTable) > [1] 1 2 3 4 > > ...but we still have the partial matching issue... > > ##And of course, as per the usual behaviour of match(), sample elements > missing from the lookup table should return NA: > > matchSample <- c("a","frog","e","d") ; print(matchSample) > match(matchSample,lookupTable) > > Is there a nifty way to get what I'm after without resorting to a for > loop? (my code's already got too blasted many of those...) > > Thanks, > > Alec Zwart > CMIS CSIRO > alec.zwart at csiro.au > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
How about x <- lookupTable x[!(x %in% matchSample)] <- NA pmatch(matchSample,x) Regards, Moshe. --- On Mon, 23/6/08, Alec.Zwart at csiro.au <Alec.Zwart at csiro.au> wrote:> From: Alec.Zwart at csiro.au <Alec.Zwart at csiro.au> > Subject: [R] One-to-one matching? > To: r-help at r-project.org > Received: Monday, 23 June, 2008, 12:57 PM > Hi folks, > > Can anyone suggest an efficient way to do "matching > without > replacement", or "one-to-one matching"? > pmatch() doesn't quite provide > what I need... > > For example, > > lookupTable <- > c("a","b","c","d","e","f") > matchSample <- > c("a","a","b","d") > ##Normal match() behaviour: > match(matchSample,lookupTable) > [1] 1 1 2 4 > > My problem here is that both "a"s in matchSample > are matched to the same > "a" in the lookup table. I need the elements of > the lookup table to be > excluded from the table as they are matched, so that no > match can be > found for the second "a". > > Function pmatch() comes close to what I need: > > pmatch(matchSample,lookupTable) > [1] 1 NA 2 4 > > Yep! However, pmatch() incorporates partial matching, > which I > definitely don't want: > > lookupTable <- > c("a","b","c","d","e","aaaaaaaaf") > > matchSample <- > c("a","a","b","d") > pmatch(matchSample,lookupTable) > [1] 1 6 2 4 > ## i.e. the second "a", matches > "aaaaaaaaf" - I don't want this. > > Of course, when identical items ARE duplicated in both > sample and lookup > table, I need the matching to reflect this: > > lookupTable <- > c("a","a","c","d","e","f") > matchSample <- > c("a","a","c","d") > ##Normal match() behaviour > match(matchSample,lookupTable) > [1] 1 1 3 4 > > No good - pmatch() is better: > > lookupTable <- > c("a","a","c","d","e","f") > matchSample <- > c("a","a","c","d") > pmatch(matchSample,lookupTable) > [1] 1 2 3 4 > > ...but we still have the partial matching issue... > > ##And of course, as per the usual behaviour of match(), > sample elements > missing from the lookup table should return NA: > > matchSample <- > c("a","frog","e","d") > ; print(matchSample) > match(matchSample,lookupTable) > > Is there a nifty way to get what I'm after without > resorting to a for > loop? (my code's already got too blasted many of > those...) > > Thanks, > > Alec Zwart > CMIS CSIRO > alec.zwart at csiro.au > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
My thanks to Gabor Grothendieck, Charles C. Berry and Moshe Olshansky for their suggested solutions. The upshot of which is that a nice one-line solution to my one-to-one exact matching problem is the Grothendieck-Berry collaboration of match(make.unique(matchSample), make.unique(lookupTable)) I've settled on this particular solution as it appears to be the fastest of the three possibilities given, although Moshe's solution comes a close second :-) Many thanks... Alec On Sun, Jun 22, 2008 at 10:57 PM, <Alec.Zwart at csiro.au> wrote:> Hi folks, > > Can anyone suggest an efficient way to do "matching without > replacement", or "one-to-one matching"? pmatch() doesn't quite > provide what I need... > > For example, > > lookupTable <- c("a","b","c","d","e","f") > matchSample <- c("a","a","b","d") > ##Normal match() behaviour: > match(matchSample,lookupTable) > [1] 1 1 2 4 > > My problem here is that both "a"s in matchSample are matched to the > same "a" in the lookup table. I need the elements of the lookup table> to be excluded from the table as they are matched, so that no match > can be found for the second "a". > > Function pmatch() comes close to what I need: > > pmatch(matchSample,lookupTable) > [1] 1 NA 2 4 > > Yep! However, pmatch() incorporates partial matching, which I > definitely don't want: > > lookupTable <- c("a","b","c","d","e","aaaaaaaaf") > matchSample <- c("a","a","b","d") > pmatch(matchSample,lookupTable) > [1] 1 6 2 4 > ## i.e. the second "a", matches "aaaaaaaaf" - I don't want this. > > Of course, when identical items ARE duplicated in both sample and > lookup table, I need the matching to reflect this: > > lookupTable <- c("a","a","c","d","e","f") > matchSample <- c("a","a","c","d") > ##Normal match() behaviour > match(matchSample,lookupTable) > [1] 1 1 3 4 > > No good - pmatch() is better: > > lookupTable <- c("a","a","c","d","e","f") > matchSample <- c("a","a","c","d") > pmatch(matchSample,lookupTable) > [1] 1 2 3 4 > > ...but we still have the partial matching issue... > > ##And of course, as per the usual behaviour of match(), sample > elements missing from the lookup table should return NA: > > matchSample <- c("a","frog","e","d") ; print(matchSample) > match(matchSample,lookupTable) > > Is there a nifty way to get what I'm after without resorting to a for > loop? (my code's already got too blasted many of those...) > > Thanks, > > Alec Zwart > CMIS CSIRO > alec.zwart at csiro.au > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >