Murali.Menon at avivainvestors.com
2011-Mar-31 14:46 UTC
[R] choosing best 'match' for given factor
Folks, I have a 'matching' matrix between variables A, X, L, O:> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( c("A", "X", "L", "O"), c("A", "X", "L", "O")))> aA X L O A 1.00 0.41 0.58 0.75 X 0.41 1.00 0.60 0.86 L 0.58 0.75 1.00 0.83 O 0.60 0.86 0.83 1.00 And I have a search vector of variables> v <- c("X", "O")I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A". For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector. My function bestMatch(v, a) will then return c("A", "L") My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. I wrote this: bestMatch <- function(searchvector, matchMat) { sapply(searchvector, function(cc) { y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE]; rownames(y)[which.max(y)] }) } Any advice? Thanks, Murali
Hi Murali. I haven't compared, but this is what I would do: bestMatch<-function(searchVector, matchMat) { searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if you're sure, you could drop unique cat("Original row indices:") print(searchRow) matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates altogether cat("Corrected Matrix:\n") print(matchMat) correctedRows<-searchRow - seq_along(searchRow) + 1 #works because of the sort above cat("Corrected row indices:") print(correctedRows) sapply(correctedRows, function(cr){ lookWhere<-matchMat[cr, seq(cr-1)] cat("Will now look into:\n") print(lookWhere) cc<-which.max(lookWhere) cat("Max at position", cc, "\n") colnames(matchMat)[cc] }) } I don't think there's that much difference. Depending on specific sizes, it may be more or less costly to first shrink the search matrix like I do. And similarly depending, I may be better still if you remove the rows that you're not interested in as well (some more but similar index trickery required then. HTH, Nick Sabbe -- ping: nick.sabbe at ugent.be link: http://biomath.ugent.be wink: A1.056, Coupure Links 653, 9000 Gent ring: 09/264.59.36 -- Do Not Disapprove -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Murali.Menon at avivainvestors.com Sent: donderdag 31 maart 2011 16:46 To: r-help at r-project.org Subject: [R] choosing best 'match' for given factor Folks, I have a 'matching' matrix between variables A, X, L, O:> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( c("A", "X", "L", "O"), c("A", "X", "L", "O")))> aA X L O A 1.00 0.41 0.58 0.75 X 0.41 1.00 0.60 0.86 L 0.58 0.75 1.00 0.83 O 0.60 0.86 0.83 1.00 And I have a search vector of variables> v <- c("X", "O")I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A". For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector. My function bestMatch(v, a) will then return c("A", "L") My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. I wrote this: bestMatch <- function(searchvector, matchMat) { sapply(searchvector, function(cc) { y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE]; rownames(y)[which.max(y)] }) } Any advice? Thanks, Murali ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Folks: I think the following may be somewhat faster, as it avoids sorting: bmat <- function(mx,vec) { ? nm <- colnames(mx) ? ivec <- match(vec,nm) ? sapply(ivec,function(k){ ???if(k==1)NA? else { ????lookat <- setdiff(seq_len(k-1),ivec) ## only those to left and not in search vector ## ????nm[lookat[which.max(mx[lookat,k] )]] ???} ??} ?) } -- Bert On Thu, Mar 31, 2011 at 8:30 AM, Nick Sabbe <nick.sabbe at ugent.be> wrote:> > Hi Murali. > I haven't compared, but this is what I would do: > > bestMatch<-function(searchVector, matchMat) > { > ? ? ? ?searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if > you're sure, you could drop unique > ? ? ? ?cat("Original row indices:") > ? ? ? ?print(searchRow) > ? ? ? ?matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates > altogether > ? ? ? ?cat("Corrected Matrix:\n") > ? ? ? ?print(matchMat) > ? ? ? ?correctedRows<-searchRow - seq_along(searchRow) + 1 #works because > of the sort above > ? ? ? ?cat("Corrected row indices:") > ? ? ? ?print(correctedRows) > ? ? ? ?sapply(correctedRows, function(cr){ > ? ? ? ? ? ? ? ? ? ? ? ?lookWhere<-matchMat[cr, seq(cr-1)] > ? ? ? ? ? ? ? ? ? ? ? ?cat("Will now look into:\n") > ? ? ? ? ? ? ? ? ? ? ? ?print(lookWhere) > ? ? ? ? ? ? ? ? ? ? ? ?cc<-which.max(lookWhere) > ? ? ? ? ? ? ? ? ? ? ? ?cat("Max at position", cc, "\n") > ? ? ? ? ? ? ? ? ? ? ? ?colnames(matchMat)[cc] > ? ? ? ? ? ? ? ?}) > } > I don't think there's that much difference. Depending on specific sizes, it > may be more or less costly to first shrink the search matrix like I do. And > similarly depending, I may be better still if you remove the rows that > you're not interested in as well (some more but similar index trickery > required then. > > HTH, > > > Nick Sabbe > -- > ping: nick.sabbe at ugent.be > link: http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > > -- Do Not Disapprove > > > > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of Murali.Menon at avivainvestors.com > Sent: donderdag 31 maart 2011 16:46 > To: r-help at r-project.org > Subject: [R] choosing best 'match' for given factor > > Folks, > > I have a 'matching' matrix between variables A, X, L, O: > > > a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58, > 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( > ? ?c("A", "X", "L", "O"), c("A", "X", "L", "O"))) > > > a > ? ? ?A ? ? X ? ? L ? ? O > A ?1.00 ?0.41 ?0.58 ?0.75 > X ?0.41 ?1.00 ?0.60 ?0.86 > L ?0.58 ?0.75 ?1.00 ?0.83 > O ?0.60 ?0.86 ?0.83 ?1.00 > > And I have a search vector of variables > > > v <- c("X", "O") > > I want to write a function bestMatch(searchvector, matchMat) such that for > each variable in searchvector, I get the variable that it has the highest > match to - but searching only among variables to the left of it in the > 'matching' matrix, and not matching with any variable in searchvector > itself. > > So in the above example, although "X" has the highest match (0.86) with "O", > I can't choose "O" as it's to the right of X (and also because "O" is in the > searchvector v already); I'll have to choose "A". > > For "O", I will choose "L", the variable it's best matched with - as it > can't match "X" already in the search vector. > > My function bestMatch(v, a) will then return c("A", "L") > > My matrix a is quite large, and I have a long list of search vectors v, so I > need an efficient method. > > I wrote this: > > bestMatch <- function(searchvector, ?matchMat) { > ? ? ? ?sapply(searchvector, function(cc) { > ? ? ? ? ? ? ? ? ? ? ? ? ? ? y <- matchMat[!(rownames(matchMat) %in% > searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), > cc, drop = FALSE]; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? rownames(y)[which.max(y)] > ? ? ? ?}) > } > > Any advice? > > Thanks, > > Murali > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics
Try this: bestMatch <- function(search, match) { colnames(match)[pmax(apply(match[,search], 2, which.max) - 1, 1)] } On Thu, Mar 31, 2011 at 11:46 AM, <Murali.Menon at avivainvestors.com> wrote:> Folks, > > I have a 'matching' matrix between variables A, X, L, O: > >> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58, > 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( > ? ?c("A", "X", "L", "O"), c("A", "X", "L", "O"))) > >> a > ? ? ?A ? ? X ? ? L ? ? O > A ?1.00 ?0.41 ?0.58 ?0.75 > X ?0.41 ?1.00 ?0.60 ?0.86 > L ?0.58 ?0.75 ?1.00 ?0.83 > O ?0.60 ?0.86 ?0.83 ?1.00 > > And I have a search vector of variables > >> v <- c("X", "O") > > I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. > > So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A". > > For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector. > > My function bestMatch(v, a) will then return c("A", "L") > > My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. > > I wrote this: > > bestMatch <- function(searchvector, ?matchMat) { > ? ? ? ?sapply(searchvector, function(cc) { > ? ? ? ? ? ? ? ? ? ? ? ? ? ? y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE]; > ? ? ? ? ? ? ? ? ? ? ? ? ? ? rownames(y)[which.max(y)] > ? ? ? ?}) > } > > Any advice? > > Thanks, > > Murali > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O