operator== on the unordered containers

From:

Joe Gottman <jgottman@carolina.rr.com>

Newsgroups:

comp.lang.c++.moderated

Date:

Thu, 10 Apr 2008 20:39:49 CST

Message-ID:

<47feb872$0$27672$4c368faf@roadrunner.com>

    Why is there no operator== defined for the unordered containers?
Assuming a good hash function it is possible to define one for
unordered_set and unordered_map that takes O(N) time (where N = the size
of the collection):

template <class X, class H, class P, class A>
bool operator==(const unorderd_set<T, H, P, A> &lhs,
                 const unordered_set<T, H, P, A> &rhs)
{
    if (lhs.size() != rhs.size()) return false;
    for (auto it1 = lhs.begin(); it1!= lhs.end(); ++it1) {
       auto it2 = rhs.find(*it1);
       if ((it2 == rhs.end()) || (*it1 != *it2)) return false;
    }
    return true;
}

Whether this is possible for unordered_multiset and unordered_multiset
depends on how unordered we want the unordered container to be.
Consider two hash_multimap<int, char> objects h1 and h2, such that h1
contains the elements { {1, 'a'}, {1, 'b'} and h2 contains the same
elements in the opposite order. The efficiency of operator== on
hash_multimap and hash_multiset depends on whether s1 and s2 should be
considered equal.

If the order of elements with equivalent keys matters, then the
unordered_multiset and unordered_multimap can still be compared in O(N)
time:

template <class X, class H, class P, class A>
bool operator==(const unorderd_multiset<T, H, P, A> &lhs,
                 const unordered_multiset<T, H, P, A> &rhs)
{
    if (lhs.size() != rhs.size()) return false;
    for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) {
       if (lhs.count(*it1) != rhs.count(*it1)) return false;
       auto rng = rhs.equal_range(*it);
       for (auto it2 = rng.first; it2 != rng.second; ++it1, ++it2) {
          if (*it1 != *it2) return false;
          }
      }
      return true;
}

If the order of elements with equivalent keys doesn't matter then
unordered_multiset and unordered_multimap can be compared in O(Nk) time
where k is the maximum number of elements with equivalent keys.

template <class X, class H, class P, class A>
bool operator==(const unorderd_multiset<T, H, P, A> &lhs,
                 const unordered_multiset<T, H, P, A> &rhs)
{
    if (lhs.size() != rhs.size()) return false;
    for auto it1= lsh.begin(); it1!= lhs.end(); /*No increment*/) {
       if (lhs.count(*it1) != rhs.count(*it1)) return false;
       //Can't just search for all the elements equivalent to *it1 in
rhs because the multiplicities might differ.
       auto rng = rhs.equal_range(*it);
       list<unordered_multiset::const_iterator> rh_list;

       for (auto it2 = rh_rng.first; it2 != rng.second; ++ it2) {
          rh_list.insert(it);
       }
       for (/*No init */ ; !rh_list.empty(); ++it1) {
         auto it2 = rh_list.begin();
          for (/*No init */; (it2 != rh_list.end()) && (*it1 != *it2);
++it2) {}
          if (it2 == rh_list.end()) { //None found equal to *it1
                return false;
           } else {
                 rh_list.erase(it2); // Make sure we don't match the
same element twice
          }
      }
   }
}

Note that all these equality operators can blow up if the unordered
container has a bad hash function. But if an unordered container has a
bad hash function then ALL of it's operations become inefficient. I
still think it's worthwhile to be able to compare two hashed containers
for equality.

Joe Gottman

--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated. First time posters: Do this! ]