Economics Guy
2010-Mar-27 22:15 UTC
[R] Assigning factors probabilistically based on the value of another variable.
I am revising a program that I wrote when I was very new at R (2007ish), and while I have been able to write very nice and fast code for almost all of it, there is one issue that I cannot seem to do it in less than 40 ugly and computationally expensive lines. I have a data frame that contains one variable: theFrame <- data.frame(theValues=runif(150,-10,10)) I would like to write a function that would assign each of these values a factor, and I need it to meet several criteria: (1) There are 15 factors. (2) I need there to be exactly 10 elements assigned to each factor. Now here is the tricky part: (3) I would like to assign the factor probabilistically. The lower theValue is for a row, the lower factor I would like it to receive. So values close to -10 should have a really high probability of being assigned factor 1. If assigning factors is to tricky I would settle for placing theValues in a 10 x 15 matrix where the lower values would be more likely to end up in column 1 (again, values close to -10 should have a really high probability of being assigned to column 1.). Any ideas? I have thought at times I was painfully close only to realize I was completely wrong. Thanks, That Economics Guy
Charles C. Berry
2010-Mar-27 23:24 UTC
[R] Assigning factors probabilistically based on the value of another variable.
On Sat, 27 Mar 2010, Economics Guy wrote:> I am revising a program that I wrote when I was very new at R > (2007ish), and while I have been able to write very nice and fast code > for almost all of it, there is one issue that I cannot seem to do it > in less than 40 ugly and computationally expensive lines. > > I have a data frame that contains one variable: > > theFrame <- data.frame(theValues=runif(150,-10,10)) > > I would like to write a function that would assign each of these > values a factor, and I need it to meet several criteria: > > (1) There are 15 factors. > (2) I need there to be exactly 10 elements assigned to each factor. > > Now here is the tricky part: > > (3) I would like to assign the factor probabilistically. The lower > theValue is for a row, the lower factor I would like it to receive. So > values close to -10 should have a really high probability of being > assigned factor 1. > > If assigning factors is to tricky I would settle for placing theValues > in a 10 x 15 matrix where the lower values would be more likely to end > up in column 1 (again, values close to -10 should have a really high > probability of being assigned to column 1.).It is really the same thing. One of many possibilities:> theFrame <- data.frame(theValues=runif(150,-10,10)) > exact <- diag(15)[1+ (rank(theFrame$theValues)-1)%/%10,] > not.so.exact <- diag(15)[1+ (rank(theFrame$theValues+runif(150,0,3))-1)%/%10,]If what you actually wanted was one factor with fifteen levels, just wrap the subscript in the last assignment in factor() instead. HTH, Chuck> > Any ideas? I have thought at times I was painfully close only to > realize I was completely wrong. > > Thanks, > > That Economics Guy > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Economics Guy
2010-Mar-28 12:37 UTC
[R] Assigning factors probabilistically based on the value of another variable.
> It is really the same thing. One of many possibilities: > >> theFrame <- data.frame(theValues=runif(150,-10,10)) >> exact <- diag(15)[1+ (rank(theFrame$theValues)-1)%/%10,] >> not.so.exact <- diag(15)[1+ >> (rank(theFrame$theValues+runif(150,0,3))-1)%/%10,] > > If what you actually wanted was one factor with fifteen levels, just wrap > the subscript in the last assignment in factor() instead. > > HTH, > > ChuckThanks Chuck, this does what I asked for: theValues <- runif(150,-10,10) exact <- factor(1+(rank(theValues)-1)%/%100) Unfortunately, it looks like my example may have been too contrived for my actual program. In the solution that Chuck proposed he antisipated that I may not want the assignment of factor levels to be exact: notSoExact <- factor(1+(rank(theValues+runif(150,0,3)-1)%/%100) This is close to what I need. However, in the real program what I need to be able to do is precisely vary the degree of exactness in the assignment. So I need to be able to have the assignment range from completely random assignment to the exact assignment that "exact" above provides. Anyone think of a simple way to do this? Thanks Agan, That Economics Guy