Daniel Malter
2011-Jun-28 19:59 UTC
[R] create a factor variable from two numeric variables when order is irrelevant
Hi all, I have two numeric variables that form combinations in a matched sample. Let's say I have five levels of x and y. What I am seeking to create is a factor variable that ignores the order of x and y, i.e., the factor should indicate x=1, y=5, as the same factor as x=5, y=1. Obviously, this becomes increasingly cumbersome to do by hand as the number of levels increases. f<-1:5 x<-sample(f,100,replace=T) y<-sample(f,100,replace=T) d<-matrix(cbind(x,y),ncol=2) #A working solution is to remove the order, multiply one column by a scaling constant, add the second column, and create the factor for this numeric value. However, I was wondering whether there is less awkward, more direct way to do this. i<-apply(t(apply(d,1,function(x) sort(x))),1,function(y) 10*y[1]+y[2]) i<-factor(i) i Thanks for your help, Daniel -- View this message in context: http://r.789695.n4.nabble.com/create-a-factor-variable-from-two-numeric-variables-when-order-is-irrelevant-tp3631318p3631318.html Sent from the R help mailing list archive at Nabble.com.
David Winsemius
2011-Jun-28 20:53 UTC
[R] create a factor variable from two numeric variables when order is irrelevant
On Jun 28, 2011, at 3:59 PM, Daniel Malter wrote:> Hi all, > > I have two numeric variables that form combinations in a matched > sample. > Let's say I have five levels of x and y. What I am seeking to create > is a > factor variable that ignores the order of x and y, i.e., the factor > should > indicate x=1, y=5, as the same factor as x=5, y=1. Obviously, this > becomes > increasingly cumbersome to do by hand as the number of levels > increases. > > f<-1:5 > x<-sample(f,100,replace=T) > y<-sample(f,100,replace=T) > d<-matrix(cbind(x,y),ncol=2) > > #A working solution is to remove the order, multiply one column by a > scaling > constant, add the second column, and create the factor for this > numeric > value. However, I was wondering whether there is less awkward, more > direct > way to do this. > > i<-apply(t(apply(d,1,function(x) sort(x))),1,function(y) 10*y[1]+y[2]) > i<-factor(i) > iI came up with the same solution, but implemented it a bit differently: > d <- pmin(x,y)+5*pmax(x,y) > sort(unique(d)) [1] 11 21 22 31 32 33 41 42 43 44 51 52 53 54 55 > d <- factor(pmin(x,y)+10*pmax(x,y)) > unique(d) [1] 41 42 32 54 51 21 22 33 53 11 31 44 43 52 55 Levels: 11 21 22 31 32 33 41 42 43 44 51 52 53 54 55 Seems that you might find the the BioC people doing something isomorphic to this with gene allele pairs using their fancy S4 methods. -- David Winsemius, MD West Hartford, CT