3

I am making a program using Java todo colo quantization using mean shift clustering algorithm and the image is RGB with resolution of 512x512. I want to reduce the image file size by reducing the total color in the input image.

I have a problem with defining the bandwidth for calculating the euclidian squared distance in the mean shift algorithm.

How do you know the appropriate bandwidth for the data? Are there any formulation to define it?

user10057710
  • 73
  • 1
  • 3

1 Answers1

4

The bandwidth is the distance/size scale of the kernel function, i.e. what the size of the “window” is across which you calculate the mean.

There is no bandwidth that works well for all purposes and all instances of the data. Instead, you will need to either

  • manually select an appropriate bandwith for your algorithm; or

  • use an algorithm that automatically adapts or estimates the bandwidth (though this implies some computational overhead).

    • The Python sklearn module offers an estimate_bandwith() function based on a nearest-neighbor analysis.
    • A wealth of research exists about this topic, e.g. Comaniciu, Ramesh, Meer (2001): The variable bandwidth mean shift and data-driven scale selection.

Any discussion of bandwidth or kernels first requires that you have already defined a distance metric. For image processing you will have to select a suitable colour space. RGB might not produce best results, depending on your goals.

amon
  • 132,749
  • 27
  • 279
  • 375
  • 1
    Thanks for your answer. But I must ask, is there any calculation to help selecting an approriate bandwidth? Also now i have problem defining condition concerning when does the centroid converging. It would help if you can make some suggestion regarding defining bandwidth and converging condition. Thanks – user10057710 Mar 18 '19 at 11:04
  • @user10057710 That's not really an area I'm really familiar with, you might instead want to look at existing research or at tutorials on the method. Also note that this site is about Software Engineering, which doesn't include in-depth discussion of clustering algorithms. – amon Mar 18 '19 at 11:27