How to Calculate Hamming Distance in R

by | Programming, R, Tips

In R, Hamming distance is a type of distance metric for finding how similar two vectors are. If the vectors are equal in length, Hamming distance determines the number of bit positions different between them.

We can also describe Hamming distance as the minimum number of substitutions required to change one string into another or the minimum number of errors that transform one string to another.

In R, we can use the following syntax to calculate the Hamming distance between two vectors x and y:

hamming_distance <- sum(x != y)

This tutorial will go through examples of Hamming distance using R.


Visual Description of Hamming Distance

Let’s look at an example of calculating the Hamming distance between two DNA sequences:

There are edits at two locations between the two strings, and therefore the Hamming distance is 2.

The Hamming distance applies to any string, not just DNA sequences. Calculating the Hamming distance by hand can be time-consuming once strings become hundreds or thousands of characters long.

For ease and speed, we can calculate the Hamming distance programmatically.

Hamming Distance in R Example #1: Using sum() with numeric vectors

We can use the built-in sum method to count the number of differences between corresponding elements in two vectors. Let’s look at an example with two numeric vectors:

vec1 <- c(1, 3, 2, 7, 8)
vec2 <- c(1, 7, 2, 8, 3)

hd <- sum(vec1 != vec2)

print(paste("Hamming distance between vec1 and vec2 = ", hd))
[1] "Hamming distance between vec1 and vec2 =  3"

The Hamming distance between the two numeric vectors is 3.

Hamming Distance in R Example #2: Using sum() with binary vectors

Let’s look at an example of calculating the Hamming distance between two binary vectors:

vec1 <- c(0, 1, 0, 1, 0, 0)
vec2 <- c(1, 1, 0, 1, 0, 1)

hd <- sum(vec1 != vec2)

print(paste("Hamming distance between vec1 and vec2 = ", hd))
[1] "Hamming distance between vec1 and vec2 =  2"

The Hamming distance between the two binary vectors is 2.

Hamming Distance in R Example #3: Using sum() with character vectors

Let’s look at an example of calculating the Hamming distance between two character vectors:

vec1 <- c('g', 'a', 't', 't', 'a', 'c', 'a')
vec2 <- c('g', 'a', 'c', 't', 'a', 't', 'a')
 
hd <- sum(vec1 != vec2)
 
print(paste("Hamming distance between vec1 and vec2 = ", hd))
[1] "Hamming distance between vec1 and vec2 =  2"

The Hamming distance between the two character vectors is 2.

Summary

Congratulations on reading to the end of this article!

For how to calculate the Hamming distance in Python, go to the article: How to Calculate the Hamming Distance in Python.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee