How to model heuristic uncertainty when sorting data?

Question

Sorting algorithms assume you have a defined comparator. For example, if sorting integers A and B, the A > B operation allows you to determine if A should be after or before B.

Imagine you want to implement a sort, but rather than a defined and consistent comparator, the comparison uses a heuristic of some sense - so it's not a guarantee it will be correct. An example might be analogous to how eye doctors identify your prescription. They show two prescriptions and ask which is better. In some cases, as a patient you might not answer correctly for two close values. For values that are far apart, you will be more accurate.

If for example, you want to compare numeric values, you might model this as a custom comparator like:

public class HeuristicComparator implements Comparator {

    @Override
    public int compare(double val1, double val2) {
        double adjustedVal1 = val1 + Math.random();
        double adjustedVal2 = val2 + Math.random();    

        return Double.compare(adjustedVal1, adjustedVal2);
    }

}

In this basic example, I arbitrarily assigning some random variation to the comparison operation, effectively saying that any double values less than 1.000 apart might be sorted incorrectly. 1.5 might be less than 1.3, but only some of the time.

What other strategies are there for modeling heuristic behavior like this rather than a comparator? It seems you could modify the sort logic too (perhaps skip random points, modify a sort to not "finish" all adjacent sorts)? What sorting algorithms that can be easily modified to match this behavior?

Or would it always be preferable to modify the comparator?

Robert Harvey · Accepted Answer · 2015-11-17T04:07:35.577

It's only preferable to modify the comparator if you can accept a number as your fuzzy result, and the computation only requires your left and right comparison values.

public int compare(double val1, double val2) {

    int magic = andThenAMiracleOccurs(val1, val2);    

    return magic;
}

What could be in andThenAMiracleOccurs? Almost anything. An Adaptive Logic Network comes to mind. Using an ALN would allow you to perform least squares fitting to a curve; the output given the input could be almost any sort of mapping or approximation from input to output.

Of course, functions being what they are, val1 and val2 could be almost anything as well.

public int compare(Vector set1, Vector set2) {

    int magic = someFuzzyComparator(set1, set2);    

    return magic;
}

Here's something a little less abstract:

public int Compare(string string1, string string2)
{
    return SignedLevenshteinDistance(string1, string2);
}

which would tell you not only whether two strings were equal, but would also tell you how different they are if they're not equal, and which direction they vary alphabetically.

Comparators can be enormously flexible, under the right circumstances (i.e. anywhere you need some special ordering in a collection that can be modeled on comparing two inputs and producing a signed integer output).

Hmmm. So your recommendation is to change the comparator (or modify how it works) rather than the algorithm itself? — enderland, Nov 17 '15 at 15:02
Only if the uncertainty can be modelled using two inputs and one int output as a pivot within an iterated or sorted collection. I'd love to hear from others with scenarios that don't fit this model, though. — Robert Harvey, Nov 17 '15 at 16:04

NietzscheanAI · Answer 2 · 2015-11-19T09:01:25.687

Are you sure you want to implement the Comparator interface, rather than defining some more suitable one of your own?

For example, the randomized example you give cannot meet the transitivity requirement of Comparator, and of course can never be consistent with equals.

Hence you might get some nasty surprises (including possible non-termination) if you were to use this with standard containers and/or searching/sorting algorithms

score 1 · Answer 3 · answered Nov 17 '15 at 22:54

For your example, you could view each item in the array as an interval from the minimum possible value to the maximum possible value. Then you could randomly pick an interval, iterative reduce it to the intersection of what's left of it and any of the original intervals it still intersects, then quick sort using that interval as the pivot and considering intersection to be equality. (Effectively you are using quick-sort, slightly modified because you need to reduce the interval via intersection to ensure correctness).

In the general case, you would need to specify what you want and how the heuristic works. Are you interested in fault tolerance for a not always accurate non-deterministic heuristic (if so, fault tolerant sorting networks might be relevant)? If you want fault tolerance for a faulty deterministic heuristic there really is nothing to be done. Are there varying degrees of success in the sort that you want to trade of for time? The general case of your question is too broad to answer.

How to model heuristic uncertainty when sorting data?

3 Answers3