Murali.Menon at avivainvestors.com
2011-Mar-31 14:46 UTC
[R] choosing best 'match' for given factor
Folks, I have a 'matching' matrix between variables A, X, L, O:> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list( c("A", "X", "L", "O"), c("A", "X", "L", "O")))> aA X L O A 1.00 0.41 0.58 0.75 X 0.41 1.00 0.60 0.86 L 0.58 0.75 1.00 0.83 O 0.60 0.86 0.83 1.00 And I have a search vector of variables> v <- c("X", "O")I want to write a function bestMatch(searchvector, matchMat) such that for each variable in searchvector, I get the variable that it has the highest match to - but searching only among variables to the left of it in the 'matching' matrix, and not matching with any variable in searchvector itself. So in the above example, although "X" has the highest match (0.86) with "O", I can't choose "O" as it's to the right of X (and also because "O" is in the searchvector v already); I'll have to choose "A". For "O", I will choose "L", the variable it's best matched with - as it can't match "X" already in the search vector. My function bestMatch(v, a) will then return c("A", "L") My matrix a is quite large, and I have a long list of search vectors v, so I need an efficient method. I wrote this: bestMatch <- function(searchvector, matchMat) { sapply(searchvector, function(cc) { y <- matchMat[!(rownames(matchMat) %in% searchvector) & (index(rownames(matchMat)) < match(cc, rownames(matchMat))), cc, drop = FALSE]; rownames(y)[which.max(y)] }) } Any advice? Thanks, Murali
Hi Murali.
I haven't compared, but this is what I would do:
bestMatch<-function(searchVector, matchMat)
{
searchRow<-unique(sort(match(searchVector, colnames(matchMat)))) #if
you're sure, you could drop unique
cat("Original row indices:")
print(searchRow)
matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
altogether
cat("Corrected Matrix:\n")
print(matchMat)
correctedRows<-searchRow - seq_along(searchRow) + 1 #works because
of the sort above
cat("Corrected row indices:")
print(correctedRows)
sapply(correctedRows, function(cr){
lookWhere<-matchMat[cr, seq(cr-1)]
cat("Will now look into:\n")
print(lookWhere)
cc<-which.max(lookWhere)
cat("Max at position", cc, "\n")
colnames(matchMat)[cc]
})
}
I don't think there's that much difference. Depending on specific sizes,
it
may be more or less costly to first shrink the search matrix like I do. And
similarly depending, I may be better still if you remove the rows that
you're not interested in as well (some more but similar index trickery
required then.
HTH,
Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36
-- Do Not Disapprove
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of Murali.Menon at avivainvestors.com
Sent: donderdag 31 maart 2011 16:46
To: r-help at r-project.org
Subject: [R] choosing best 'match' for given factor
Folks,
I have a 'matching' matrix between variables A, X, L, O:
> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
c("A", "X", "L", "O"),
c("A", "X", "L", "O")))
> a
A X L O
A 1.00 0.41 0.58 0.75
X 0.41 1.00 0.60 0.86
L 0.58 0.75 1.00 0.83
O 0.60 0.86 0.83 1.00
And I have a search vector of variables
> v <- c("X", "O")
I want to write a function bestMatch(searchvector, matchMat) such that for
each variable in searchvector, I get the variable that it has the highest
match to - but searching only among variables to the left of it in the
'matching' matrix, and not matching with any variable in searchvector
itself.
So in the above example, although "X" has the highest match (0.86)
with "O",
I can't choose "O" as it's to the right of X (and also because
"O" is in the
searchvector v already); I'll have to choose "A".
For "O", I will choose "L", the variable it's best
matched with - as it
can't match "X" already in the search vector.
My function bestMatch(v, a) will then return c("A", "L")
My matrix a is quite large, and I have a long list of search vectors v, so I
need an efficient method.
I wrote this:
bestMatch <- function(searchvector, matchMat) {
sapply(searchvector, function(cc) {
y <- matchMat[!(rownames(matchMat) %in%
searchvector) & (index(rownames(matchMat)) < match(cc,
rownames(matchMat))),
cc, drop = FALSE];
rownames(y)[which.max(y)]
})
}
Any advice?
Thanks,
Murali
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Folks:
I think the following may be somewhat faster, as it avoids sorting:
bmat <- function(mx,vec)
{
? nm <- colnames(mx)
? ivec <- match(vec,nm)
? sapply(ivec,function(k){
???if(k==1)NA? else {
????lookat <- setdiff(seq_len(k-1),ivec) ## only those to left and not
in search vector ##
????nm[lookat[which.max(mx[lookat,k] )]]
???}
??}
?)
}
-- Bert
On Thu, Mar 31, 2011 at 8:30 AM, Nick Sabbe <nick.sabbe at ugent.be>
wrote:>
> Hi Murali.
> I haven't compared, but this is what I would do:
>
> bestMatch<-function(searchVector, matchMat)
> {
> ? ? ? ?searchRow<-unique(sort(match(searchVector, colnames(matchMat))))
#if
> you're sure, you could drop unique
> ? ? ? ?cat("Original row indices:")
> ? ? ? ?print(searchRow)
> ? ? ? ?matchMat<-matchMat[, -searchRow, drop=FALSE] #avoid duplicates
> altogether
> ? ? ? ?cat("Corrected Matrix:\n")
> ? ? ? ?print(matchMat)
> ? ? ? ?correctedRows<-searchRow - seq_along(searchRow) + 1 #works
because
> of the sort above
> ? ? ? ?cat("Corrected row indices:")
> ? ? ? ?print(correctedRows)
> ? ? ? ?sapply(correctedRows, function(cr){
> ? ? ? ? ? ? ? ? ? ? ? ?lookWhere<-matchMat[cr, seq(cr-1)]
> ? ? ? ? ? ? ? ? ? ? ? ?cat("Will now look into:\n")
> ? ? ? ? ? ? ? ? ? ? ? ?print(lookWhere)
> ? ? ? ? ? ? ? ? ? ? ? ?cc<-which.max(lookWhere)
> ? ? ? ? ? ? ? ? ? ? ? ?cat("Max at position", cc, "\n")
> ? ? ? ? ? ? ? ? ? ? ? ?colnames(matchMat)[cc]
> ? ? ? ? ? ? ? ?})
> }
> I don't think there's that much difference. Depending on specific
sizes, it
> may be more or less costly to first shrink the search matrix like I do. And
> similarly depending, I may be better still if you remove the rows that
> you're not interested in as well (some more but similar index trickery
> required then.
>
> HTH,
>
>
> Nick Sabbe
> --
> ping: nick.sabbe at ugent.be
> link: http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
>
> -- Do Not Disapprove
>
>
>
>
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at
r-project.org] On
> Behalf Of Murali.Menon at avivainvestors.com
> Sent: donderdag 31 maart 2011 16:46
> To: r-help at r-project.org
> Subject: [R] choosing best 'match' for given factor
>
> Folks,
>
> I have a 'matching' matrix between variables A, X, L, O:
>
> > a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
> 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
> ? ?c("A", "X", "L", "O"),
c("A", "X", "L", "O")))
>
> > a
> ? ? ?A ? ? X ? ? L ? ? O
> A ?1.00 ?0.41 ?0.58 ?0.75
> X ?0.41 ?1.00 ?0.60 ?0.86
> L ?0.58 ?0.75 ?1.00 ?0.83
> O ?0.60 ?0.86 ?0.83 ?1.00
>
> And I have a search vector of variables
>
> > v <- c("X", "O")
>
> I want to write a function bestMatch(searchvector, matchMat) such that for
> each variable in searchvector, I get the variable that it has the highest
> match to - but searching only among variables to the left of it in the
> 'matching' matrix, and not matching with any variable in
searchvector
> itself.
>
> So in the above example, although "X" has the highest match
(0.86) with "O",
> I can't choose "O" as it's to the right of X (and also
because "O" is in the
> searchvector v already); I'll have to choose "A".
>
> For "O", I will choose "L", the variable it's best
matched with - as it
> can't match "X" already in the search vector.
>
> My function bestMatch(v, a) will then return c("A",
"L")
>
> My matrix a is quite large, and I have a long list of search vectors v, so
I
> need an efficient method.
>
> I wrote this:
>
> bestMatch <- function(searchvector, ?matchMat) {
> ? ? ? ?sapply(searchvector, function(cc) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? y <- matchMat[!(rownames(matchMat) %in%
> searchvector) & (index(rownames(matchMat)) < match(cc,
rownames(matchMat))),
> cc, drop = FALSE];
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? rownames(y)[which.max(y)]
> ? ? ? ?})
> }
>
> Any advice?
>
> Thanks,
>
> Murali
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."
-- Maimonides (1135-1204)
Bert Gunter
Genentech Nonclinical Biostatistics
Try this:
bestMatch <- function(search, match) {
colnames(match)[pmax(apply(match[,search], 2, which.max) - 1, 1)]
}
On Thu, Mar 31, 2011 at 11:46 AM, <Murali.Menon at avivainvestors.com>
wrote:> Folks,
>
> I have a 'matching' matrix between variables A, X, L, O:
>
>> a <- structure(c(1, 0.41, 0.58, 0.75, 0.41, 1, 0.6, 0.86, 0.58,
> 0.6, 1, 0.83, 0.75, 0.86, 0.83, 1), .Dim = c(4L, 4L), .Dimnames = list(
> ? ?c("A", "X", "L", "O"),
c("A", "X", "L", "O")))
>
>> a
> ? ? ?A ? ? X ? ? L ? ? O
> A ?1.00 ?0.41 ?0.58 ?0.75
> X ?0.41 ?1.00 ?0.60 ?0.86
> L ?0.58 ?0.75 ?1.00 ?0.83
> O ?0.60 ?0.86 ?0.83 ?1.00
>
> And I have a search vector of variables
>
>> v <- c("X", "O")
>
> I want to write a function bestMatch(searchvector, matchMat) such that for
each variable in searchvector, I get the variable that it has the highest match
to - but searching only among variables to the left of it in the
'matching' matrix, and not matching with any variable in searchvector
itself.
>
> So in the above example, although "X" has the highest match
(0.86) with "O", I can't choose "O" as it's to the
right of X (and also because "O" is in the searchvector v already);
I'll have to choose "A".
>
> For "O", I will choose "L", the variable it's best
matched with - as it can't match "X" already in the search vector.
>
> My function bestMatch(v, a) will then return c("A",
"L")
>
> My matrix a is quite large, and I have a long list of search vectors v, so
I need an efficient method.
>
> I wrote this:
>
> bestMatch <- function(searchvector, ?matchMat) {
> ? ? ? ?sapply(searchvector, function(cc) {
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? y <- matchMat[!(rownames(matchMat) %in%
searchvector) & (index(rownames(matchMat)) < match(cc,
rownames(matchMat))), cc, drop = FALSE];
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? rownames(y)[which.max(y)]
> ? ? ? ?})
> }
>
> Any advice?
>
> Thanks,
>
> Murali
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O