3

I have a set of sets S and I want to find a minimal set of elements M such that each set in S shares at least one element with M.

S = set of sets
∀ f∈S ∃e · e∈f ∧ e∈M

For example:

S = {{1}, {1, 2, 3}, {3, 4}, {5, 6}}
M = {1, 3, 5} or {1, 3, 6} or {1, 4, 5} or {1, 4, 6}

Is there a name for this problem? What algorithms exist to find an approximation of M?

Neil
  • 22,670
  • 45
  • 76
ICR
  • 481
  • 3
  • 9
  • are you sure you have the M in example correct? – ne2dmar Feb 26 '15 at 09:50
  • @ne2dmar It was indeed wrong. Last minute I modified it to try and get a single solution to M, but screwed it up. I have updated it now. – ICR Feb 26 '15 at 09:55

2 Answers2

3

If you think about it, this problem is very similar to the variation of the Boolean satisfiability problem of 3-satisfiability aka 3-SAT.

Take each set within S into individual boolean expressions separated by "or" gates. Then each of these expressions would then be separated by "and" gates. For example {{A}, {A, B, C}, {C, D}, {E, F}} could be translated into A ∧ (A ∨ B ∨ C) ∧ (C ∨ D) ∧ (E ∨ F).

Determining if a solution to this exists is NP-complete. Though your particular problem always has a solution (worse case scenario being that you have no items in common), I fear that it is a sophistication of the 3-SAT problem. Not only do you want a possible solution, you ideally want the one that requires the least number of items choosen.

If you want the very best solution, I think the best you can do is to search every possible solution and test it. However, if an approximate solution is acceptable, you could begin searching for a solution by counting the number of items each set has in common with other sets and counting the number of times each unique value is used in every set. Then attempt to formulate an approximation by choosing the most used value in each set starting with the set which has the most items in common with other sets and moving down, stopping if you manage to find a valid solution. Again, while it isn't guaranteed to be an optimal solution, it will likely be a decent approximation.

Best of luck.

Neil
  • 22,670
  • 45
  • 76
  • I've ended up with a variation of your suggestion for the approximation which, for the sample inputs I've used, yields similar (often slightly smaller) results and is easier to implement efficiently. Sort $S$ by cardinality (lowest to highest), then for each set left in $S$ pick the element that is in the most other sets, then remove all sets from $S$ that include that element. res = []; s.sort_by! &:length; while !s.empty?; r = s.shift.max_by {|x| s.count {|i| i.include? x}}; res << r; s.delete_if {|i| i.include? r}; end; – ICR Feb 26 '15 at 14:40
3

This problem is known in the literature as "minimal hitting set", and there are lots of algorithms for both exact and approximate solutions. I recently benchmarked about twenty algorithms for exact solutions and found the MMCS and RS algorithms of Murakami and Uno 1 to be extremely fast and easy to implement. Those authors have made C implementations available, and I have provided a parallelized C++ implementation. I also have a survey paper in preparation which should be available on arXiv shortly if you're interested in the details.

1 K. Murakami and T. Uno, "Efficient algorithms for dualizing large-scale hypergraphs". Discrete Applied Mathematics 170, p. 83-94. doi:10.1016/j.dam.2014.01.012. arxiv:1102.3813