Dear list, I have a problem with a toy example: mtrx <- matrix(c(1,1,0,1,1,1,0,1,1,0,0,1), nrow=3) rownames(ma) <- letters[1:3] I would like to determine which is the minimum combination of rows that "covers" all columns with at least a 1. None of the rows covers all columns; all three rows clearly covers all columns, but there are simpler combinations (1st and the 3rd, or 2nd and 3rd) which also covers all columns. I solved this problem by creating a second logical matrix which contains all possible combinations of rows: tt <- matrix(as.logical(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,1,1,1,1,1)), nrow=3) and then subset the first matrix and check if all columns are covered. This solution, though, is highly inneficient and I am certain that a combination of apply or something will do. ########################### possibles <- NULL length.possibles <- NULL ## I guess the minimum solution is has half the number of rows guesstimate <- floor(nrow(tt)/2) + nrow(tt) %% 2 checked <- logical(nrow(tt)) repeat { ifelse(checked[guesstimate], break, checked[guesstimate] <- TRUE) partials <- as.matrix(tt[, colSums(tt) == guesstimate]) layer.solution <- logical(ncol(partials)) for (j in 1:ncol(partials)) { if (length(which(colSums(mtrx[partials[, j], ]) > 0)) == ncol(mtrx)) { layer.solution[j] <- TRUE } } if (sum(layer.solution) == 0) { if (!is.null(possibles)) break guesstimate <- guesstimate + 1 } else { for (j in which(layer.solution)) { possible.solution <- rownames(mtrx)[partials[, j]] possibles[[length(possibles) + 1]] <- possible.solution length.possibles <- c(length.possibles, length(possible.solution)) } guesstimate <- guesstimate - 1 } } final.solution <- possibles[which(length.possibles == min(length.possibles))] ########################### More explicitely (if useful) it is about reducing a prime implicants chart in a Quine-McCluskey boolean minimisation algorithm. I tried following the original algorithm applying row dominance and column dominance, but (as I am not a computer scientist), I am unable to apply it. If you have a better solution for this, I would be gratefull if you'd share it. Thank you in advance, Adrian -- Adrian DUSA Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101
On 11/19/2005 8:00 AM, Adrian DUSA wrote:> Dear list, > > I have a problem with a toy example: > mtrx <- matrix(c(1,1,0,1,1,1,0,1,1,0,0,1), nrow=3) > rownames(ma) <- letters[1:3] > > I would like to determine which is the minimum combination of rows that > "covers" all columns with at least a 1. > None of the rows covers all columns; all three rows clearly covers all > columns, but there are simpler combinations (1st and the 3rd, or 2nd and 3rd) > which also covers all columns. > > I solved this problem by creating a second logical matrix which contains all > possible combinations of rows: > tt <- matrix(as.logical(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,1,1,1,1,1)), nrow=3) > > and then subset the first matrix and check if all columns are covered. > This solution, though, is highly inneficient and I am certain that a > combination of apply or something will do.First of all, I imagine there isn't a unique solution, i.e. there are probably several subsets that can't be reduced but which are not equal. Do you care if you find the smallest one of those? If so, it looks like a reasonably hard problem. If not, it's a lot easier: total all of the columns, then see if there is any row whose entries all correspond to columns with counts bigger than 1. Remove it, and continue. This will find a local minimum in one pass through the rows. You could make it better by sorting the rows into an order so that rows that dominate other rows come later, but I think you still wouldn't be guaranteed to find the global min. (By the way, I'm not sure if we have a function that can do this: i.e., given a partial ordering on the rows, sort the matrix so that the resulting order is consistent with it.) Duncan Murdoch> > ########################### > > possibles <- NULL > length.possibles <- NULL > ## I guess the minimum solution is has half the number of rows > guesstimate <- floor(nrow(tt)/2) + nrow(tt) %% 2 > checked <- logical(nrow(tt)) > repeat { > ifelse(checked[guesstimate], break, checked[guesstimate] <- TRUE) > partials <- as.matrix(tt[, colSums(tt) == guesstimate]) > layer.solution <- logical(ncol(partials)) > > for (j in 1:ncol(partials)) { > if (length(which(colSums(mtrx[partials[, j], ]) > 0)) == ncol(mtrx)) { > layer.solution[j] <- TRUE > } > } > if (sum(layer.solution) == 0) { > if (!is.null(possibles)) break > guesstimate <- guesstimate + 1 > } else { > for (j in which(layer.solution)) { > possible.solution <- rownames(mtrx)[partials[, j]] > possibles[[length(possibles) + 1]] <- possible.solution > length.possibles <- c(length.possibles, length(possible.solution)) > } > guesstimate <- guesstimate - 1 > } > } > final.solution <- possibles[which(length.possibles == min(length.possibles))] > > ########################### > > More explicitely (if useful) it is about reducing a prime implicants chart in > a Quine-McCluskey boolean minimisation algorithm. I tried following the > original algorithm applying row dominance and column dominance, but (as I am > not a computer scientist), I am unable to apply it. > > If you have a better solution for this, I would be gratefull if you'd share > it. > Thank you in advance, > Adrian >
Try minizing 1'x subject to 1 >= x >= 0 and m'x >= 1 where m is your mtrx and ' means transpose. It seems to give an integer solution, 1 0 1, with linear programming even in the absence of explicit integer constraints: library(lpSolve) lp("min", rep(1,3), rbind(t(mtrx), diag(3)), rep(c(">=", "<="), 4:3), rep(1,7))$solution On 11/19/05, Adrian DUSA <adi at roda.ro> wrote:> Dear list, > > I have a problem with a toy example: > mtrx <- matrix(c(1,1,0,1,1,1,0,1,1,0,0,1), nrow=3) > rownames(ma) <- letters[1:3] > > I would like to determine which is the minimum combination of rows that > "covers" all columns with at least a 1. > None of the rows covers all columns; all three rows clearly covers all > columns, but there are simpler combinations (1st and the 3rd, or 2nd and 3rd) > which also covers all columns. > > I solved this problem by creating a second logical matrix which contains all > possible combinations of rows: > tt <- matrix(as.logical(c(1,0,0,0,1,0,0,0,1,1,1,0,1,0,1,0,1,1,1,1,1)), nrow=3) > > and then subset the first matrix and check if all columns are covered. > This solution, though, is highly inneficient and I am certain that a > combination of apply or something will do. > > ########################### > > possibles <- NULL > length.possibles <- NULL > ## I guess the minimum solution is has half the number of rows > guesstimate <- floor(nrow(tt)/2) + nrow(tt) %% 2 > checked <- logical(nrow(tt)) > repeat { > ifelse(checked[guesstimate], break, checked[guesstimate] <- TRUE) > partials <- as.matrix(tt[, colSums(tt) == guesstimate]) > layer.solution <- logical(ncol(partials)) > > for (j in 1:ncol(partials)) { > if (length(which(colSums(mtrx[partials[, j], ]) > 0)) == ncol(mtrx)) { > layer.solution[j] <- TRUE > } > } > if (sum(layer.solution) == 0) { > if (!is.null(possibles)) break > guesstimate <- guesstimate + 1 > } else { > for (j in which(layer.solution)) { > possible.solution <- rownames(mtrx)[partials[, j]] > possibles[[length(possibles) + 1]] <- possible.solution > length.possibles <- c(length.possibles, length(possible.solution)) > } > guesstimate <- guesstimate - 1 > } > } > final.solution <- possibles[which(length.possibles == min(length.possibles))] > > ########################### > > More explicitely (if useful) it is about reducing a prime implicants chart in > a Quine-McCluskey boolean minimisation algorithm. I tried following the > original algorithm applying row dominance and column dominance, but (as I am > not a computer scientist), I am unable to apply it. > > If you have a better solution for this, I would be gratefull if you'd share > it. > Thank you in advance, > Adrian > > -- > Adrian DUSA > Romanian Social Data Archive > 1, Schitu Magureanu Bd > 050025 Bucharest sector 5 > Romania > Tel./Fax: +40 21 3126618 \ > +40 21 3120210 / int.101 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
I suspect that the answer is that finding all solutions will be hard. L1 regression is a special case of LP. I learned how to move around the corners of the solution space, and could easily find all of the solutions in the special case of a two-way table. However, sometimes there were a lot of solutions. I would guess that your problem has a lot of solutions as well. One cheat would be to do the LP problem multiple times with the rows of your matrix randomly permuted. Assuming you keep track of the real rows, you could then get a sense of how many solutions there might be. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Adrian Dusa wrote:>On Saturday 19 November 2005 17:24, Gabor Grothendieck wrote: > > >>[...snip...] >>Although the above is not wrong I should have removed the rbind >>which is no longer needed and simplifying it further, as it seems >>that lp will do the rep for you itself for certain arguments, gives: >> >>lp("min", rep(1,3), t(mtrx), ">=", 1)$solution # 1 0 1 >> >> > >Thank you Gabor, this solution is superbe (you never stop amazing me :) >Now... it only finds _one_ of the multiple minimum solutions. In the toy >example, there are two minimum solutions, hence I reckon the output should >have been a list with: >[[1]] >[1] 1 0 1 > >[[2]] >[1] 0 1 1 > >Also, thanks to Duncan and yes, I do very much care finding the smallest >possible solutions (if I correctly understand your question). > >It seems that lp function is very promising, but can I use it to find _all_ >minimum solutions? > >Adrian > > > > >
On Saturday 19 November 2005 19:17, Patrick Burns wrote:> [....snip...] One cheat would be to do the LP problem > multiple times with the rows of your matrix randomly > permuted. Assuming you keep track of the real rows, > you could then get a sense of how many solutions there > might be.Thanks for the answer. The trick does work (i.e. it finds all minimum solutions) provided that I permute the rows a sufficient number of times. And I have to compare each solution to the existing (unique) ones, which takes a lot of time... In your experience, what would be the definiton of "multiple times" for large matrices? My (dumb) solution is guaranteed to find all possible minimums, because it checks every possible combination. For large matrices, though, this would be really slow. I wonder if that could be vectorized in some way; before the LP function, I was thinking there might be a more efficient way to loop over all possible columns (using perhaps the apply family). Thanks again, Adrian -- Adrian DUSA Romanian Social Data Archive 1, Schitu Magureanu Bd 050025 Bucharest sector 5 Romania Tel./Fax: +40 21 3126618 \ +40 21 3120210 / int.101
On 11/19/05, Adrian DUSA <dusa.adrian at gmail.com> wrote:> On Saturday 19 November 2005 19:17, Patrick Burns wrote: > > [....snip...] One cheat would be to do the LP problem > > multiple times with the rows of your matrix randomly > > permuted. Assuming you keep track of the real rows, > > you could then get a sense of how many solutions there > > might be. > > Thanks for the answer. The trick does work (i.e. it finds all minimum > solutions) provided that I permute the rows a sufficient number of times. And > I have to compare each solution to the existing (unique) ones, which takes a > lot of time... > In your experience, what would be the definiton of "multiple times" for large > matrices? > > My (dumb) solution is guaranteed to find all possible minimums, because it > checks every possible combination. For large matrices, though, this would be > really slow. I wonder if that could be vectorized in some way; before the LP > function, I was thinking there might be a more efficient way to loop over all > possible columns (using perhaps the apply family). > > Thanks again, > Adrian >Getting back to your original question of using apply, solving the LP gives us the number of components in any minimal solution and exhaustive search of all solutions with that many components can be done using combinations from gtools and apply like this: library(gtools) # needed for combinations soln <- lp("min", rep(1,3), rbind(t(mtrx)), rep(">=", 4), rep(1,4))$solution k <- sum(soln) m <- nrow(mtrx) combos <- combinations(m,k) combos[apply(combos, 1, function(idx) all(colSums(mtrx[idx,]))),] In the example we get: [,1] [,2] [1,] 1 3 [2,] 2 3 which says that rows 1 and 3 of mtrx form one solution and rows 2 and 3 of mtrx form another solution.