Michael Haenlein
2011-Sep-15 12:53 UTC
[R] Allocation of data points to groups based on membership probabilities
Dear all, I have a matrix that provides, for a series of data points, the probability that each of these points belongs to a certain group. Take the following example, which represents 20 data points and their group membership probability to five groups (A-E): set.seed(1) probs <- matrix(runif(100),nrow=20, dimnames=list(c(),c("A","B","C","D","E"))) In addition know how large each group should be. Assume for example, that the groups sizes in the aforementioned example are 5, 4, 1, 6, 4 for A, B, C, D and E respectively. I would like to allocate individuals to the groups so that (a) each group has the size it is supposed to have and (b) all data points are part of the group where they have a high probability of belonging. For some data points this allocation is straightforward, because one group membership probability is much larger than the others. But for others two or more probabilities are very similar which means that a datapoint could be allocated to either one or the other group. I guess it should be possible to write some iterative code or an optimization routine that can do what I would like to do, but I do not know how. Does anyone have an idea how this could be done? Thanks very much in advance, Michael Haenlein Michael Haenlein Associate Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]]