How to Calculate Hamming Distance in C++ (With C++20, Bit Shifting, Strings, and More)

by | C++, Data Science, Programming, Tips

The Hamming distance measures the difference between two strings of equal length, typically used for binary sequences. It counts the number of positions where the corresponding bits or characters differ. This distance is widely used in error detection and correction codes, machine learning, and other domains.

In this post, we’ll walk through how to calculate the Hamming distance in C++ with the following sections:

Go to How to Calculate the Hamming Distance in Python for the Python implementation.


Bit Shifting and Hamming Distance

Before we dive into the code, let’s understand how bit shifting and XOR help calculate Hamming distance. The XOR operation compares two numbers bit by bit and sets each corresponding bit in the result to 1 if they differ and 0 if they are the same. By counting how many 1s appear in the XOR result, we can determine how many bits differ between the two numbers.

Example:

Let’s take two integers:

x = 5 (binary: 0101)
y = 10 (binary: 1010)

The XOR of x and y is:

x ^ y = 0101 ^ 1010 = 1111

This result (1111) indicates that all four bits differ between x and y. Therefore, the Hamming distance is 4.

Now, to count the number of 1s in the XOR result, we can use bit shifting to move through the bits one by one and check each bit.

Code Example (Using Bit Shifting):

#include <iostream>

int hammingDistanceWithShift(int x, int y) {
    int xorValue = x ^ y;
    int distance = 0;
    
    // Shift and count set bits
    while (xorValue) {
        distance += (xorValue & 1);  // Check if the least significant bit is 1
        xorValue >>= 1;  // Bit shifting, right shift the bits by one position
    }
    
    return distance;
}

int main() {
    int x = 5, y = 10;
    std::cout << "Hamming Distance (with bit shifting): " << hammingDistanceWithShift(x, y) << std::endl;
    return 0;
}

Output:

Hamming Distance (with bit shifting): 4

In this method:

  • The XOR operation (x ^ y) creates a number where each bit represents whether the corresponding bits of x and y are different.
  • We then shift through the bits of the XOR result using >>, counting the number of 1s.

Basic Brute-Force Approach

A simpler way to calculate Hamming distance, especially for smaller inputs, is to convert the integers to binary strings and compare the bits manually.

Code Example (Using Basic Brute-Force)

#include <iostream>
#include <bitset>

int hammingDistance(int x, int y) {
    std::bitset<32> bx(x);
    std::bitset<32> by(y);

    int distance = 0;
    for (int i = 0; i < 32; ++i) {
        if (bx[i] != by[i]) {
            distance++;
        }
    }
    return distance;
}

int main() {
    int x = 5, y = 10;
    std::cout << "Hamming Distance Using Brute Force: " << hammingDistance(x, y) << std::endl;
    return 0;
}

Output:

Hamming Distance Using Brute Force: 4

Optimized Approach Using XOR

The brute-force method, while simple, can be inefficient for large inputs. A faster method is to calculate the XOR of the two integers and count the number of set bits (the bits that are 1).

Code Example (Using the Optimized XOR Approach)

#include <iostream>

int hammingDistance(int x, int y) {
    int xorValue = x ^ y;
    int distance = 0;

    // Count set bits in the XOR result
    while (xorValue) {
        distance += (xorValue & 1);
        xorValue >>= 1;
    }

    return distance;
}

int main() {
    int x = 5, y = 10;
    std::cout << "Hamming Distance using Optimized XOR Approach: " << hammingDistance(x, y) << std::endl;
    return 0;
}

Output:

Hamming Distance using Optimized XOR Approach: 4

Modern C++20 Method Using std::popcount

With C++20, you can take advantage of the std::popcount function, which counts the number of 1s in the result of the XOR operation. This method is cleaner and more efficient for large inputs.

Code Example (C++20)

#include <iostream>

int hammingDistance(int x, int y) {
    return std::popcount(static_cast<unsigned int>(x ^ y));  // Cast to unsigned int
}

int main() {
    int x = 5, y = 10;
    std::cout << "Hamming Distance using std::popcount: " << hammingDistance(x, y) << std::endl;
    return 0;
}

Output:

Hamming Distance using std::popcount: 4

Remember std::popcount in C++20 only accepts unsigned integer types. Since you are passing x ^ y, a signed integer, it must be cast to an unsigned integer type.


Applying Hamming Distance to Strings

While the examples above focus on integers, Hamming distance can also be used to compare strings of equal length. The distance is the number of positions where the characters differ.

Code Example (Strings):

#include <iostream>
#include <string>

// Function to calculate the Hamming distance between two strings
int hammingDistance(const std::string &s1, const std::string &s2) {
    // Check if the strings have equal length
    if (s1.length() != s2.length()) {
        throw std::invalid_argument("Strings must be of equal length");  // Throw an exception if lengths are not equal
    }

    int distance = 0;  // Initialize a counter for Hamming distance
    
    // Loop through each character in the strings
    for (size_t i = 0; i < s1.length(); ++i) {
        if (s1[i] != s2[i]) {  // Compare characters at the same position
            distance++;  // Increment the distance if the characters differ
        }
    }

    return distance;  // Return the final Hamming distance
}

int main() {
    // Example strings
    std::string str1 = "karolin";
    std::string str2 = "kathrin";

    try {
        // Call the hammingDistance function and print the result
        std::cout << "Hamming Distance (strings): " << hammingDistance(str1, str2) << std::endl;
    } catch (const std::invalid_argument &e) {
        // Handle the case where strings have unequal lengths
        std::cerr << e.what() << std::endl;
    }
    
    return 0;
}

Output:

Hamming Distance (strings): 3

Pros and Cons of Using Hamming Distance

While Hamming distance is a simple and widely used metric, it has both advantages and limitations depending on the context in which it is used.

Pros of Hamming Distance

  1. Simplicity: Hamming distance is easy to understand and implement. It works directly on binary or string data without requiring complex transformations.
  2. Efficiency: When using optimized methods like XOR and std::popcount, calculating Hamming distance becomes computationally efficient, even for large datasets.
  3. Applicability in Error Detection: Hamming distance is fundamental to error detection and correction algorithms, making it highly relevant in communication systems.
  4. Fast Calculation for Binary Data: The bitwise nature of Hamming distance makes it ideal for comparing binary data or bit strings.

Cons of Hamming Distance

  1. Limited to Equal-Length Sequences: Hamming distance only applies to sequences (whether binary or string) of equal length. Sequences of differing lengths require other distance metrics.
  2. Insensitive to Order: Hamming distance only looks at positions where data differ but does not account for how much they differ (e.g., swapping two characters).
  3. Not Suitable for All Data Types: While useful for binary and string data, Hamming distance may not be as meaningful for other types of data, such as continuous numerical data. In those cases, other metrics like Euclidean distance or Manhattan distance might be more appropriate.
  4. Outdated for Some Applications: In certain advanced machine learning tasks, more sophisticated metrics such as cosine similarity or dynamic time warping may be preferred over Hamming distance.

Hamming distance is a versatile tool for comparing data in various fields, from error detection to machine learning. And when it comes to performance, C++ truly shines for such distance calculations. With its low-level memory control, optimized standard library, and support for modern features like std::popcount, C++ allows you to calculate distances like Hamming with blazing speed and efficiency. Whether handling large datasets or performing bit-level operations, C++ gives you the precision and performance you need to tackle these tasks head-on.

Congratulations on reading to the end of this tutorial! If you would like to read more on Hamming Distance in other programming languages, go to the article How to Calculate the Hamming Distance in Python.

If you would like more reading on distance calculations in C++, you can go to How to Calculate Mahalanobis Distance in C++ (With Eigen, Singular and Non-Singular Examples).

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee