10

I've been playing with making image mosaics. My script takes a large number of images, scales them down to thumbnail size and then uses them as tiles to approximate a target image.

The approach is actually quite pleasing:

I compute the mean square error for every thumb in every tile position.

At first I just used a greedy placement: put the thumb with the least error on the tile it best fits, and then the next and so on.

The problem with greedy is that it leaves you eventually placing the most different thumbs on the least popular tiles, whether they match closely or not. I show examples here: http://williamedwardscoder.tumblr.com/post/84505278488/making-image-mosaics

So I then do random swaps until the script is interrupted. The results are quite OK.

A random swap of two tiles is not always an improvement, but sometimes a rotation of three or more tiles results in a global improvement i.e. A <-> B may not improve, but A -> B -> C -> A1 may..

For this reason, after picking two random tiles and discovering they do not improve, I pick a bunch of tiles to evaluate if they can be the third tile in such a rotation. I do not explore if any set of four tiles can be profitably rotated, and so on; that'd be super-expensive real soon.

But this takes time.. A lot of time!

Is there a better and faster approach?


Bounty Update

I tested out various Python implementations and bindings of the Hungarian Method.

By far the fastest was the pure-Python https://github.com/xtof-durr/makeSimple/blob/master/Munkres/kuhnMunkres.py

My hunch is that this approximates the optimal answer; when run on a test image, all other libraries agreed on the result but this kuhnMunkres.py, whilst being orders of magnitude faster, only got very very very close to the score the other implementations agreed on.

Speed is very data-dependent; Mona Lisa rushed through kuhnMunkres.py in 13 minutes, but the Scarlet Chested Parakeet took 16 minutes.

Results were much the same as random swaps and rotations for the Parakeet:

enter image description hereenter image description here

(kuhnMunkres.py on the left, random swaps on the right; original image for comparision)

However, for the Mona Lisa image I tested with, the results were noticeably improved and she actually had her defined 'smile' shining through:

enter image description hereenter image description here

(kuhnMunkres.py on the left, random swaps on the right)

Will
  • 183
  • 1
  • 10
  • 1
    Related...ish. On Codegolf [palate transform](http://codegolf.stackexchange.com/q/33172/12166) had similar problems. –  Sep 01 '14 at 04:31
  • 1
    And another related set of images is [allRGB](http://allrgb.com) where each image (though that doesn't give you too much of a hint of *how* to do it... just that there's another area where this problem has been approached). –  Sep 01 '14 at 19:03
  • 1
    I ran into this problem with a mosaic maker quite a few years ago. My line of reasoning then and now is that the problem is not so much with your algorithm (the MSE part) but rather with the limited size of your input image palette. Not having a billion images to work with, I faked it by allowing an image to be reused after some amount of time. However, if you want to keep with your approach, it may be good to do a first pass for "good" fits and then treat the rest of the images as random (or random-ish) - with a limited input set you only have so many choices. – J Trana Sep 02 '14 at 05:14
  • @MichaelT thanks for that excellent link :) The codegolf particularly is fascinating. I find the voted-best solutions are using random swaps (not random rotations) and are presumably running for quite a while... – Will Sep 02 '14 at 05:44
  • 1
    Coming to this after you have selected an answer and awarded a bounty. A different approach would be to treat this as a [Simulated Annealing](http://en.wikipedia.org/wiki/Simulated_annealing) problem. You could use SA as one of the stages of your solution pipeline. – andy256 Sep 04 '14 at 00:29

3 Answers3

3

I'm reasonably sure that's an NP-hard problem. To find a 'perfect' solution you have to try every possibility exhaustively, and that is exponential.

One approach would be to use the greedy fit and then try to improve it. That could be by taking a badly placed image (one of the last ones) and finding another place to put it, then taking that image and moving it and so on. You are done when you (a) run out of time (b) the fit is 'good enough'.

If you introduce a probabilistic element it might yield to a simulated annealing approach, or a genetic algorithm. Perhaps all you're trying to achieve is to spread the errors evenly. I suspect this is getting close to what you're already doing so the answer is: with the right algorithm you may get a better result faster but there is no magic shortcut to Nirvana.


Yes, this is similar to what you're already doing. The point is to forget a magic answer and to think in terms of 2 algorithms: first fill, then optimise.

The fill could be: random, best available, first best, good enough, some kind of hot spot.

The optimise could be random, fix the worst, or (as I suggested) simulated annealing or genetic algorithm.

You need a metric of 'goodness' and an amount of time you're prepared to spend on it and just experiment. Or find someone who has actually done it.

david.pfx
  • 8,105
  • 2
  • 21
  • 44
3

Yes, there are two better and faster approaches.

  • Simpler problem : for each tile, choose the best thumb (with possible duplication). Ok, that's cheating, but can only lead to better visual result.
  • Your take is algorithmically more interesting, and boils down to "linear assignment problem", assuming you take MSE as match costs whose sum must be minimal. Such problem can be solved in polynomial time, via eg the "Hungarian Method"

Then, you can adjust your costs by replacing MSE by a more visually accurate distance, without changing the underlying algorithm.

YvesgereY
  • 276
  • 1
  • 3
  • Thx! LAP and Hungarian Method were the leads I needed! Update with results in question. – Will Sep 03 '14 at 21:33
1

If the last tiles are your problem, you should try to place them early on, somehow ;)

One approach would be to look at the tile that is the furthest away from the top x% of its matches (intuitively I'd go with 33%) and place that on its best match. That's the best match it can get anyway.

Furthermore you could choose not to use the best match for the worst tile, but the one where it introduces the least error compared to the best match for that slot, so that you don't completely throw away your best matches for the sake of "damage control".

Another thing to bare in mind is that in the end you are producing an image to be processed by an eye. So what you really want is to use some edge detection to determine which positions on your image are most important. Similarly, what happens at the very periphery of the image is of little value to the quality of the effect. Superpose these two weights and include them in your distance calculation. Any jitter you get should thus gravitate toward the border and away from edges, thus disturbing a lot less.

Also with the edge detection in place, you may want to place the first y% greedily (maybe until you drop below a certain threshold of "edginess" in the tiles left), so that the "hot spots" are dealt with really nicely, and then switch to "damage control" for the rest.

back2dos
  • 29,980
  • 3
  • 73
  • 114