July 22, 2011

A Gotcha of C++ map and set

C++ std::map and std::set allow you to define your own less-than comparator for the key type. But one thing to bear in mind: The comparator must follow the rules of strick weak ordering! That is:
  • x < x is always false (irreflexivity)
  • If x < y, then y < x is false (asymmetric)
  • If x < y and y < z, then x < z (transitivity)
  • If x = y and y = z, then x = z (transitivity of equivalence)
Miss any one of these rules, you will get undefined behavior!

To see what it means, let me show you a mistake I made.

I was thinking of keeping a bunch of non-repeated (unique) floats, but I wish to compare the floats with some tolerance. For example, 1.0 and 1.01 should be same. So naturally, I used float as the map key, and the mapped value can be any type I need. Let's say here I use int.

In order to compare floats with tolerance, I wrote a float comparator like this:
struct FloatCmp
{
    bool operator()(float a, float b)
    {
        const float epsilon = 0.01f;
        if (fabsf(a - b) < epsilon)
            return false; // a == b
        return a < b;
    }
};
You can see line 6-7 checks if two floats are equal first before using the less-than comparison (line 8). Now, I can initialize my map container happily:
std::map<float, int, FloatCmp> table;
table[1.0f] = 1;
table[1.001f] = 2;   // same as table[1.0f] = 2
table[1.002f] = 3;   // same as table[1.0f] = 3
It all seemed to work perfectly. But I found sometimes it didn't work. It produced some duplicated keys. What did I do wrong?

The answer: I broke the rule of "transitivity of equivalence" (if x = y and y = z then x = z). Both FloatCmp(1.0f, 1.006f) and FloatCmp(1.006f, 1.012f) are true, but FloatCmp(1.0f, 1.012f) is not true! This makes the result depend on how you insert your elements. Consider the following code:
#include <iostream>
#include <map>

int main()
{
    std::map<float, int, FloatCmp> map1;
    std::map<float, int, FloatCmp> map2;

    float a = 1.000f;
    float b = 1.006f;
    float c = 1.012f;

    map1[a] = 1;
    map1[b] = 1;
    map1[c] = 1;

    map2[b] = 1;
    map2[a] = 1;
    map2[c] = 1;

    std::cout << map1.size() << std::endl;  // print 2
    std::cout << map2.size() << std::endl;  // print 1
    return 0;
}
After the insertions, map1 contains a and c, and map2 contains b. This is astonishing: Two maps with the same set of elements inserted can have different sizes because you insert them in different orders. It's not hard to figure out why by tracing the code.

So how to fix this? Well, as long as "transitivity of equivalence" is followed, there would be no problems. Finally I came up with this new comparator:
inline float discretize(float a)
{
    return floorf(a * 100.0f) / 100.0f;
}
struct FloatCmp
{
    bool operator()(float a, float b)
    {
        float aa = discretize(a);
        float bb = discretize(b);
        if (aa == bb)
            return false;
        return aa < bb;
    }
};
At line 9-10, discretize() function truncates (round off) floating-point numbers to 2 decimal digits. For example, discretize(1.1234f) returns 1.12f. And check if they're equal with == operator directly (line 11). In this way, I make sure "transitivity of equivalence" is strictly followed. Now if you relaunch the previous main program with this new comparator, both map1 and map2 will contain two elements as expected.

No comments:

Post a Comment