Greetings. A subset of a problem that the group I work with turns out to be
an optimization problem, in the sense of linear programming.
We've looked at several approaches to this but haven't found one that
seems to
be the right fit. I'm looking for some guidance in finding an R package
that
does what we want in an acceptable time.
Here's a toy example. We start with a matrix, called "gMat1"
(historical
reasons):
> gMat1 <- matrix(c(3, 6, 9, 5, 9, 5), nrow=3)
> print(gMat1)
[,1] [,2]
[1,] 3 5
[2,] 6 9
[3,] 9 5
The goal is to add the contents of one column of each row to one of two bins
(in general the number of bins equals the number of columns) such that the
minimum number in each bin is maximized. An example follows.
In the toy example, the possibilities can be enumerated simply:
> allChoices <- expand.grid(rep(list(1:ncol(gMat1)), nrow(gMat1)))
> print(allChoices)
Var1 Var2 Var3
1 1 1 1
2 2 1 1
3 1 2 1
4 2 2 1
5 1 1 2
6 2 1 2
7 1 2 2
8 2 2 2
For example, with the first choice, (1, 1, 1), column 1, hence, bin 1, is
selected for all three rows, giving a result for the two bins of:
18 0
For the second choice, (2, 1, 1), the '2' in the first position selects
the
contents of column 2 for bin 2. The remaining (1, 1) select the (6, 9) in the
first column and assign those to bin 1:
15 5
The result is a set of "binSums" corresponding to the each of the set
of
possible choices:
> print(binSums)
[,1] [,2]
[1,] 18 0
[2,] 15 5
[3,] 12 9
[4,] 9 14
[5,] 9 5
[6,] 6 10
[7,] 3 14
[8,] 0 19
Having generated the sums, the goal is to pick the row that has the largest
minimum and map that back to the original choice. In the toy example, both
rows 3 and 4 satisfy that criterion ('9' is the minimum in each, and
'9' is
bigger than the minima in the other 6 rows -- 0, 3, 5, 6).
In the real case there are potentially thousands of rows and columns, so
eye-balling is not an option. And, in fact, using "expand.grid"
isn't even an
option to generate the original choices.
We've tried some ad hoc approaches that seem to work tolerably well.
Here's
one that we *might* have considered, if we *could* have generated the
"allChoices/binSums":
> bsVar <- apply(binSums, 1, var)
> locBestVar <- which(bsVar == min(bsVar))
> allChoices[locBestVar, ]
Var1 Var2 Var3
3 1 2 1
But there was a feeling within the group that we didn't have a solid-enough
foundation using the ad hoc approaches. Hence, we asked for help from an
expert in Operations Research. He was able to solve a problem of realistic
size in more or less no time at all using the "GAMS" software:
https://www.gams.com/
Unfortunately, GAMS is not free software, and we are hoping to produce an R
package that is freely distributable. The next suggestion was to use Gurobi:
http://www.gurobi.com/
which is evidently free for academic use but not otherwise. Better, but still
not perfect. (And I couldn't use the free version of Gurobi while working
from home, as it didn't consider my home network to be associated with an
academic institution -- which of course it isn't).
Finally, we tried:
Rglpk_solve_LP
from the R package "Rglpk":
https://cran.r-project.org/web/packages/Rglpk/index.html
This satisfied the licensing constraints, but we were unable to produce a
result in a "finite" amount of time. By this I mean that we ran the
Rglpk
software on a problem with 200 rows and 20 columns on the latest Mac Pro with
64GB of memory, and it didn't finish overnight. A realistic problem would
have at least 10000 rows and 50 columns. (This admittedly might simply have
been a consequence of our unfamiliarity with the package. Suggestions
welcome.) To be clear, this process is not something we'd be doing once in
order to build the package. This is something an end user would have to do
every time he/she ran our package.
If you've managed to read this far and have any suggestions as to how we
might
proceed, please send them my way. Thanks.
-- Mike