How to Calculate Cosine Similarity in R

by | Programming, R, Tips

This tutorial will go through how to calculate the cosine similarity in R for vectors and matrices with code examples.


What is Cosine Similarity?

Cosine similarity measures the similarity between two vectors of a multi-dimensional space. It is the cosine of the angle between two vectors determining whether they are pointing in the same direction.

The smaller the angle between two vectors, the more similar they are to each other. The similarity measure ignores the differences in magnitude or scale between the vectors.

Both vectors must be part of the same inner product space, meaning their inner product multiplication must produce a scalar value. Cosine similarity is used widely throughout data science and machine learning.

Real-world use cases of cosine similarity include recommender systems, measuring document similarity in natural language processing and the cosine-similarity locality-sensitive hashing technique for fast DNA sequence matching.

How to Calculate Cosine Similarity

Consider two vectors, A and B. We can calculate the cosine similarity between the vectors as follows:

cosine similarity definition
cosine similarity definition symbols

The cosine similarity divides the vector dot product vectors by the Euclidean norm product or vector magnitudes. The similarity can be any value between -1 and +1.

Visual Description of Cosine Similarity

Suppose the angle between two vectors is less than 90 degrees and closer to zero; the cosine similarity measurement will be close to 1. Therefore A and B are more similar to each other. If the angle between the two vectors is 90 degrees, the cosine similarity will have a value of 0; this means that the two vectors are orthogonal and have no correlation between them. The cos($latex \theta$) value can be in the range [-1, 1]. If the angle is much greater than 90 degrees and close to 180 degrees, the similarity value will be close to -1, indicating strongly opposite vectors or no similarity between them.

visualization of cosine angle between vectors
Three examples of similarity between vectors using the cosine angle. Source: Me

Cosine Similarity Between Two Vectors in R

Let’s look at the code to calculate the cosine similarity between two vectors in R:

install.packages("lsa")
library(lsa)

x <- c(0.12, 0.44, 0.5, 0.3, 0.7, 0.04, 0.9, 0.8)
y <- c(0.24, 0.5, 0.7, 0.21, 0.69, 0.2, 0.7, 0.5)

cosine(x, y)
          [,1]
[1,] 0.9551402

The cosine similarity between the two vectors is 0.9551402.

Cosine Similarity of a Matrix in R

We can also calculate the cosine similarity between a matrix of vectors:

x <- c(10, 13, 14, 20, 21, 40, 50, 27)
y <- c(7, 10, 12, 19, 24, 36, 40, 20)
z <- c(8, 15, 25, 3, 1, 7, 0, 50)

mat <- cbind(x, y, z)

cosine(mat)
          x         y         z
x 1.0000000 0.9928947 0.5060730
y 0.9928947 1.0000000 0.4638441
z 0.5060730 0.4638441 1.0000000

We can interpret the output as follows:

  • The cosine similarity between vectors x and y is 0.9928947
  • The cosine similarity between vectors x and z is 0.5060730
  • The cosine similarity between vectors y and z is 0.4638441

Convert Data Frame to Matrix

The cosine() function works on a matrix of vectors and pairs of vectors but does not work on a data frame. We can verify this by creating a data frame containing three vectors and passing it to the cosine function

data <- data.frame(x,y,z)
data
  x  y  z
1 10  7  8
2 13 10 15
3 14 12 25
4 20 19  3
5 21 24  1
6 40 36  7
7 50 40  0
8 27 20 50
cosine(data)
Error in cosine(data) : 
  argument mismatch. Either one matrix or two vectors needed as input.

Passing a data frame to the cosine function raises an argument mismatch error. We can convert a data frame to a matrix using the as.matrix() function. Let’s look at the revised code:

cosine(as.matrix(data))
         x         y         z
x 1.0000000 0.9928947 0.5060730
y 0.9928947 1.0000000 0.4638441
z 0.5060730 0.4638441 1.0000000

We successfully converted the data frame to a matrix and passed it to the cosine function.

Summary

Congratulations on reading to the end of this tutorial!

To calculate the cosine similarity in R, go to the article: How to Calculate Cosine Similarity in Python

Go to the online courses page on R to learn more about coding in R for data science and machine learning.

Have fun and happy researching!

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨