How to Calculate Cosine Similarity in C++

by Suf | C++, Data Science, Linear Algebra, Programming

In this guide, we’ll explore how to implement cosine similarity in C++, a fundamental metric used in various applications from text mining to recommendation systems. We’ll cover everything from basic implementation to optimized solutions and practical use cases.

Introduction
Mathematical Background
Basic Implementation
Optimized Implementation
Calculating Cosine Similarity with Eigen
Performance Test: Base vs Optimized vs Eigen Cosine Similarity
Use Cases
Conclusion
Further Reading
Attribution and Citation

📚 Quick Reference

Cosine Similarity

A measure of similarity between two vectors based on the cosine of the angle between them.

Vector Normalization

The process of scaling a vector to have a magnitude (length) of 1.

Dot Product

The sum of the products of corresponding entries in two vectors.

SIMD Operations

Single Instruction Multiple Data operations that process multiple data points simultaneously.

Cache Optimization

Techniques to improve memory access patterns and reduce cache misses.

Sparse Vector

A vector where most elements are zero, common in text analysis and recommendations.

AVX Intrinsics

Low-level operations that enable vectorized computations on modern CPUs for high performance.

OpenMP

A library for parallel programming in C++ that enables multi-threaded execution for computational tasks.

Eigen Library

A C++ library for linear algebra, supporting operations on dense and sparse matrices and vectors.

Reduction Clause

An OpenMP directive ensuring thread-safe accumulation of values in parallel loops.

Introduction

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is widely used in text mining, recommendation systems, and various machine learning applications. Unlike Euclidean distance, cosine similarity is particularly well-suited for measuring similarity in high-dimensional spaces, where the magnitude of vectors can vary significantly, but their orientation matters more.

The value of cosine similarity ranges between -1 and 1:

1: Vectors are perfectly aligned (identical direction).
0: Vectors are orthogonal (no similarity).
-1: Vectors are diametrically opposed (opposite direction).

In this guide, we’ll implement cosine similarity in C++ using two approaches:

Basic Implementation: A straightforward and easy-to-follow implementation ideal for learning and small datasets.
Optimized Implementation: An advanced implementation leveraging SIMD (Single Instruction Multiple Data) for better performance on large datasets.

Tip: If you are new to vector mathematics, consider reviewing topics like dot product and vector magnitude before diving into the implementation.

By the end of this guide, you’ll have a clear understanding of cosine similarity and how to implement it efficiently in C++.

Mathematical Background

Cosine similarity is derived from the concept of the dot product of two vectors and their magnitudes. It measures the cosine of the angle between two non-zero vectors in an n-dimensional space. For two vectors \( \mathbf{A} = [A_1, A_2, \dots, A_n] \) and \( \mathbf{B} = [B_1, B_2, \dots, B_n] \), the cosine similarity is calculated as:

\[ \text{similarity} = \cos(\theta) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \|\mathbf{B}\|} = \frac{\sum_{i=1}^n A_i B_i}{\sqrt{\sum_{i=1}^n A_i^2} \sqrt{\sum_{i=1}^n B_i^2}} \]

The formula can be broken down into three key components:

Dot Product (\( \mathbf{A} \cdot \mathbf{B} \)): This is the numerator, calculated as the sum of the element-wise multiplication of the two vectors. \[ \mathbf{A} \cdot \mathbf{B} = \sum_{i=1}^n A_i B_i \]
Magnitude (\( \|\mathbf{A}\| \)): The magnitude of a vector is its length in Euclidean space, calculated as: \[ \|\mathbf{A}\| = \sqrt{\sum_{i=1}^n A_i^2} \]
Normalization: The cosine similarity divides the dot product by the product of the magnitudes of the two vectors, ensuring the result is scaled between -1 and 1.

The result of cosine similarity is interpreted as follows:

1: The vectors have the same orientation, meaning they are identical in direction.
0: The vectors are orthogonal (at a 90-degree angle), indicating no similarity.
-1: The vectors point in opposite directions, meaning they are completely dissimilar.

Tip: Cosine similarity focuses only on the orientation of vectors, making it insensitive to the magnitude. This property is especially useful in applications like text mining, where vector magnitudes often vary widely.

Note: Both vectors must be non-zero for the cosine similarity to be defined. If one or both vectors have a magnitude of zero, the calculation is invalid and should be handled as an exception.

With a solid understanding of the mathematical foundation, we are ready to translate this into code. In the next section, we’ll start with a basic implementation of cosine similarity in C++.

Basic Implementation

Let’s start with a straightforward implementation of cosine similarity in C++. This implementation focuses on clarity and correctness, making it an excellent starting point for understanding the concept. Below is the complete code with detailed commentary:

Basic Cosine Similarity Implementation


#include <iostream>         // For standard I/O operations
#include <vector>           // For working with vectors
#include <cmath>            // For mathematical functions like sqrt
#include <stdexcept>       // For throwing exceptions

class CosineSimilarity {
public:
    // Calculate cosine similarity between two vectors
    static double calculate(const std::vector<double>& A,
                            const std::vector<double>& B) {
        // Ensure both vectors are of equal length and non-empty
        if (A.size() != B.size() || A.empty()) {
            throw std::invalid_argument("Vectors must be non-empty and of equal length");
        }

        double dot_product = 0.0;  // Stores the result of the dot product
        double norm_A = 0.0;      // Sum of squares for vector A
        double norm_B = 0.0;      // Sum of squares for vector B

        // Compute dot product and norms in a single loop for efficiency
        for (size_t i = 0; i < A.size(); ++i) {
            dot_product += A[i] * B[i];    // Element-wise multiplication
            norm_A += A[i] * A[i];        // Squaring elements of A
            norm_B += B[i] * B[i];        // Squaring elements of B
        }

        // Ensure neither vector has a zero magnitude to avoid division by zero
        if (norm_A == 0.0 || norm_B == 0.0) {
            throw std::invalid_argument("Vectors must be non-zero");
        }

        // Calculate cosine similarity
        return dot_product / (std::sqrt(norm_A) * std::sqrt(norm_B));
    }
};

int main() {
    // Example vectors representing two documents or data points
    std::vector<double> doc1 = {1.0, 2.0, 3.0};  // First document vector
    std::vector<double> doc2 = {3.0, 4.0, 6.0};  // Second document vector

    try {
        // Calculate and print the cosine similarity
        double similarity = CosineSimilarity::calculate(doc1, doc2);
        std::cout << "Cosine similarity: " << similarity << std::endl;
    } catch (const std::exception& e) {
        // Handle errors gracefully
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Step-by-Step Explanation

Let’s break down the key parts of the code:

Input Validation: The method checks that both vectors are non-empty and have the same size. This ensures that element-wise operations can proceed without runtime errors.
Single Pass Calculation: The dot product and norms are computed in a single loop to minimize overhead. This approach is both efficient and easy to understand.
Edge Case Handling: If either vector has zero magnitude, an exception is thrown. This avoids dividing by zero during the normalization step.
Exception Handling in Main: The main function includes a try-catch block to gracefully handle errors, such as invalid input vectors.

Example Output

Cosine similarity: 0.99236

Tip: The vectors in the example are perfectly aligned, so the cosine similarity is 1. You can experiment with different vector values to observe how the similarity changes.

Warning: For large datasets or high-dimensional vectors, this basic implementation may not be efficient. Consider optimizing using SIMD operations, which we’ll cover in the next section.

Optimized Implementation

For large datasets, the basic implementation can be inefficient. By leveraging SIMD (Single Instruction Multiple Data) operations using AVX (Advanced Vector Extensions), we can process multiple data points in parallel, significantly improving performance. Below is the optimized implementation:

Optimized Cosine Similarity Implementation


#include <iostream>
#include <vector>
#include <cmath>
#include <immintrin.h> // For AVX instructions
#include <stdexcept>

class OptimizedCosineSimilarity {
public:
    static double calculate(const std::vector<double>& A,
                            const std::vector<double>& B) {
        if (A.size() != B.size() || A.empty()) {
            throw std::invalid_argument("Vectors must be non-empty and of equal length");
        }

        __m256d dot_product = _mm256_setzero_pd(); // AVX accumulator for dot product
        __m256d norm_A = _mm256_setzero_pd();      // AVX accumulator for vector A norm
        __m256d norm_B = _mm256_setzero_pd();      // AVX accumulator for vector B norm

        size_t i = 0;

        // Process 4 elements at a time using AVX instructions
        for (; i + 3 < A.size(); i += 4) {
            __m256d vecA = _mm256_loadu_pd(&A[i]);
            __m256d vecB = _mm256_loadu_pd(&B[i]);

            dot_product = _mm256_add_pd(dot_product, _mm256_mul_pd(vecA, vecB));
            norm_A = _mm256_add_pd(norm_A, _mm256_mul_pd(vecA, vecA));
            norm_B = _mm256_add_pd(norm_B, _mm256_mul_pd(vecB, vecB));
        }

        // Handle remaining elements that don't fit in SIMD processing
        double scalar_dot = 0.0, scalar_norm_A = 0.0, scalar_norm_B = 0.0;
        for (; i < A.size(); ++i) {
            scalar_dot += A[i] * B[i];
            scalar_norm_A += A[i] * A[i];
            scalar_norm_B += B[i] * B[i];
        }

        // Horizontal sum of SIMD registers
        double dot_sum = horizontal_sum(dot_product) + scalar_dot;
        double norm_A_sum = horizontal_sum(norm_A) + scalar_norm_A;
        double norm_B_sum = horizontal_sum(norm_B) + scalar_norm_B;

        if (norm_A_sum == 0.0 || norm_B_sum == 0.0) {
            throw std::invalid_argument("Vectors must be non-zero");
        }

        return dot_sum / (std::sqrt(norm_A_sum) * std::sqrt(norm_B_sum));
    }

private:
    // Helper function to horizontally sum AVX registers
    static double horizontal_sum(__m256d vec) {
        __m128d low = _mm256_castpd256_pd128(vec);
        __m128d high = _mm256_extractf128_pd(vec, 1);
        low = _mm_add_pd(low, high);
        __m128d high64 = _mm_unpackhi_pd(low, low);
        return _mm_cvtsd_f64(_mm_add_sd(low, high64));
    }
};

int main() {
    std::vector<double> doc1 = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};
    std::vector<double> doc2 = {2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0};

    try {
        double similarity = OptimizedCosineSimilarity::calculate(doc1, doc2);
        std::cout << "Optimized cosine similarity: " << similarity << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Example Output

Optimized cosine similarity: 1

Step-by-Step Explanation

AVX Instructions: The implementation processes 4 elements at a time using AVX intrinsics like _mm256_loadu_pd (load), _mm256_mul_pd (multiply), and _mm256_add_pd (add), improving performance through parallelism.
Scalar Fallback: A scalar loop processes any remaining elements if the vector size is not a multiple of 4.
Horizontal Sum: The horizontal_sum function reduces SIMD registers to scalar values for final calculations.

Subsection: Understanding AVX

Advanced Vector Extensions (AVX) is a set of CPU instructions that enables efficient parallel processing for numerical computations. Modern CPUs typically support AVX, but it must be explicitly enabled in your compiler.

Checking for AVX Support

You can check if your CPU supports AVX using the following command in a terminal:

sysctl -a | grep machdep.cpu.features

Look for AVX in the output. If your CPU supports AVX, you can compile and run AVX-based code.

Compiling with AVX

To compile C++ programs with AVX support, add the -mavx flag to your compiler options. For example:

g++ -std=c++20 -mavx main.cpp -o cosine_similarity

If you are using CMake, add the following line to your CMakeLists.txt:

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx")

Tip: Always test your program on systems without AVX to ensure it handles non-AVX environments gracefully.

Calculating Cosine Similarity with Eigen

Eigen is a powerful C++ library for linear algebra that makes calculating cosine similarity simple and efficient. In this section, we'll demonstrate how to compute cosine similarity using Eigen's vector operations.

Implementation

The following code calculates cosine similarity between two vectors using Eigen:

Cosine Similarity with Eigen


#include <iostream>
#include <Eigen/Dense>

// Function to calculate cosine similarity using Eigen
double calculateCosineSimilarity(const Eigen::VectorXd& A, const Eigen::VectorXd& B) {
    // Ensure the vectors have the same size
    if (A.size() != B.size() || A.size() == 0) {
        throw std::invalid_argument("Vectors must be non-empty and of the same size");
    }

    // Calculate the dot product and norms
    double dotProduct = A.dot(B);
    double normA = A.norm();
    double normB = B.norm();

    // Ensure norms are not zero
    if (normA == 0.0 || normB == 0.0) {
        throw std::invalid_argument("Vectors must have non-zero norms");
    }

    // Return cosine similarity
    return dotProduct / (normA * normB);
}

int main() {
    // Define two vectors using Eigen
    Eigen::VectorXd vecA(3);
    Eigen::VectorXd vecB(3);

    // Initialize vectors
    vecA << 1.0, 2.0, 3.0;
    vecB << 2.0, 4.0, 6.0;

    try {
        // Calculate and print cosine similarity
        double similarity = calculateCosineSimilarity(vecA, vecB);
        std::cout << "Cosine similarity: " << similarity << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Error: " << e.what() << std::endl;
    }

    return 0;
}

Explanation

Vectors in Eigen: Eigen uses the Eigen::VectorXd type for dynamically sized vectors.
Dot Product: The dot() method computes the dot product of two vectors.
Norm: The norm() method calculates the magnitude (Euclidean norm) of the vector.
Input Validation: The implementation checks that vectors are non-empty and have the same size, and handles edge cases like zero vectors to prevent division by zero.

Example Output

Cosine similarity: 1

Tip: Use Eigen's highly optimized vector operations to improve performance when working with large datasets.

Warning: Ensure that the vectors you provide are non-zero to avoid runtime errors.

Advantages of Using Eigen

Eigen offers a powerful and user-friendly framework for linear algebra operations, making it an excellent choice for calculating cosine similarity. Its key advantages include:

High Performance: Eigen is designed for speed, employing advanced techniques like lazy evaluation and vectorization to optimize matrix and vector operations.
Clean Syntax: Eigen's intuitive syntax for vectors and matrices makes code easier to write, read, and debug, especially for complex mathematical operations.
Versatility: Eigen supports various matrix types (dense, sparse), sizes (fixed or dynamic), and precisions (e.g., float, double), catering to diverse computational needs.
Header-Only Library: Eigen does not require linking against a compiled library. Simply include its headers in your project, and you're ready to go.

Tip: Download Eigen from the official website, and include the extracted directory in your project's include path. Eigen's header-only design simplifies integration into your build process.

Performance Test: Base vs Optimized vs Eigen Cosine Similarity

To evaluate the efficiency of the implementations, we will compare the performance of the base C++ implementation, the optimized implementation with AVX intrinsics, and the Eigen-based implementation. Below are the full contents of the required header files and the benchmarking code.

Header File: BaseCosineSimilarity.h

BaseCosineSimilarity.h


#ifndef BASE_COSINE_SIMILARITY_H
#define BASE_COSINE_SIMILARITY_H

#include <vector>
#include <cmath>
#include <stdexcept>

class CosineSimilarity {
public:
    static double calculate(const std::vector<double>& A, const std::vector<double>& B) {
        if (A.size() != B.size() || A.empty()) {
            throw std::invalid_argument("Vectors must be non-empty and of equal length");
        }

        double dot_product = 0.0;
        double norm_A = 0.0;
        double norm_B = 0.0;

        for (size_t i = 0; i < A.size(); ++i) {
            dot_product += A[i] * B[i];
            norm_A += A[i] * A[i];
            norm_B += B[i] * B[i];
        }

        if (norm_A == 0.0 || norm_B == 0.0) {
            throw std::invalid_argument("Vectors must be non-zero");
        }

        return dot_product / (std::sqrt(norm_A) * std::sqrt(norm_B));
    }
};

#endif // BASE_COSINE_SIMILARITY_H

Header File: BaseEigenCosineSimilarity.h

BaseEigenCosineSimilarity.h


#ifndef BASE_EIGEN_COSINE_SIMILARITY_H
#define BASE_EIGEN_COSINE_SIMILARITY_H

#include <Eigen/Dense>
#include <stdexcept>

class BaseEigenCosineSimilarity {
public:
    static double calculate(const Eigen::VectorXd& A, const Eigen::VectorXd& B) {
        if (A.size() != B.size() || A.size() == 0) {
            throw std::invalid_argument("Vectors must be non-empty and of the same size");
        }

        double dotProduct = A.dot(B);
        double normA = A.norm();
        double normB = B.norm();

        if (normA == 0.0 || normB == 0.0) {
            throw std::invalid_argument("Vectors must have non-zero norms");
        }

        return dotProduct / (normA * normB);
    }
};

#endif // BASE_EIGEN_COSINE_SIMILARITY_H

Header File: OptimizedCosineSimilarity.h

OptimizedCosineSimilarity.h


#ifndef OPTIMIZED_COSINE_SIMILARITY_H
#define OPTIMIZED_COSINE_SIMILARITY_H

#include <vector>
#include <cmath>
#include <immintrin.h>
#include <stdexcept>

class OptimizedCosineSimilarity {
public:
    static double calculate(const std::vector<double>& A, const std::vector<double>& B) {
        if (A.size() != B.size() || A.empty()) {
            throw std::invalid_argument("Vectors must be non-empty and of equal length");
        }

        __m256d dot_product = _mm256_setzero_pd();
        __m256d norm_A = _mm256_setzero_pd();
        __m256d norm_B = _mm256_setzero_pd();

        size_t i = 0;

        for (; i + 3 < A.size(); i += 4) {
            __m256d vecA = _mm256_loadu_pd(&A[i]);
            __m256d vecB = _mm256_loadu_pd(&B[i]);

            dot_product = _mm256_add_pd(dot_product, _mm256_mul_pd(vecA, vecB));
            norm_A = _mm256_add_pd(norm_A, _mm256_mul_pd(vecA, vecA));
            norm_B = _mm256_add_pd(norm_B, _mm256_mul_pd(vecB, vecB));
        }

        double scalar_dot = 0.0, scalar_norm_A = 0.0, scalar_norm_B = 0.0;
        for (; i < A.size(); ++i) {
            scalar_dot += A[i] * B[i];
            scalar_norm_A += A[i] * A[i];
            scalar_norm_B += B[i] * B[i];
        }

        double dot_sum = horizontal_sum(dot_product) + scalar_dot;
        double norm_A_sum = horizontal_sum(norm_A) + scalar_norm_A;
        double norm_B_sum = horizontal_sum(norm_B) + scalar_norm_B;

        if (norm_A_sum == 0.0 || norm_B_sum == 0.0) {
            throw std::invalid_argument("Vectors must be non-zero");
        }

        return dot_sum / (std::sqrt(norm_A_sum) * std::sqrt(norm_B_sum));
    }

private:
    static double horizontal_sum(__m256d vec) {
        __m128d low = _mm256_castpd256_pd128(vec);
        __m128d high = _mm256_extractf128_pd(vec, 1);
        low = _mm_add_pd(low, high);
        __m128d high64 = _mm_unpackhi_pd(low, low);
        return _mm_cvtsd_f64(_mm_add_sd(low, high64));
    }
};

#endif // OPTIMIZED_COSINE_SIMILARITY_H

Benchmark Code


#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include "BaseCosineSimilarity.h"
#include "BaseEigenCosineSimilarity.h"
#include "OptimizedCosineSimilarity.h"

std::vector<double> generate_random_vector(size_t size) {
    std::vector<double> vec(size);
    std::mt19937 rng(42);
    std::uniform_real_distribution<double> dist(-1.0, 1.0);
    for (size_t i = 0; i < size; ++i) {
        vec[i] = dist(rng);
    }
    return vec;
}

Eigen::VectorXd generate_random_eigen_vector(size_t size) {
    Eigen::VectorXd vec(size);
    std::mt19937 rng(42);
    std::uniform_real_distribution<double> dist(-1.0, 1.0);
    for (int i = 0; i < size; ++i) {
        vec[i] = dist(rng);
    }
    return vec;
}

int main() {
    std::vector<size_t> sizes = {1000, 10000, 100000, 1000000};

    for (size_t size : sizes) {
        // Standard C++ Vectors
        std::vector<double> vecA = generate_random_vector(size);
        std::vector<double> vecB = generate_random_vector(size);

        // Eigen Vectors
        Eigen::VectorXd eigenVecA = generate_random_eigen_vector(size);
        Eigen::VectorXd eigenVecB = generate_random_eigen_vector(size);

        auto start = std::chrono::high_resolution_clock::now();
        double similarity_base = CosineSimilarity::calculate(vecA, vecB);
        auto end = std::chrono::high_resolution_clock::now();
        auto base_duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();

        start = std::chrono::high_resolution_clock::now();
        double similarity_optimized = OptimizedCosineSimilarity::calculate(vecA, vecB);
        end = std::chrono::high_resolution_clock::now();
        auto optimized_duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();

        start = std::chrono::high_resolution_clock::now();
        double similarity_eigen_base = BaseEigenCosineSimilarity::calculate(eigenVecA, eigenVecB);
        end = std::chrono::high_resolution_clock::now();
        auto eigen_base_duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();

        std::cout << "Vector Size: " << size << std::endl;
        std::cout << "Base (C++): " << base_duration << " µs" << std::endl;
        std::cout << "Optimized (C++): " << optimized_duration << " µs" << std::endl;
        std::cout << "Base (Eigen): " << eigen_base_duration << " µs" << std::endl;
        std::cout << std::endl;
    }

    return 0;
}

Results and Analysis

To evaluate the efficiency of the implementations, we compared the performance of the base C++ implementation, the optimized implementation with AVX intrinsics, and the Eigen-based implementation. The following results provide insights into the execution times for each implementation across varying vector sizes.

Results

Vector Size       Base (C++)   Optimized (C++)   Base (Eigen)
------------------------------------------------------------
1,000             15 µs       3 µs              21 µs
10,000            152 µs      40 µs             159 µs
100,000           1,293 µs    301 µs            1,343 µs
1,000,000         13,225 µs   3,074 µs          14,481 µs

Analysis

The results show that the **Optimized C++ implementation** consistently outperforms both the **Base C++** and **Base Eigen** implementations. The performance gains are particularly significant for larger vector sizes due to the following factors:

Optimized C++: The use of AVX intrinsics allows the optimized implementation to process four elements at a time, drastically reducing the number of iterations required and taking advantage of SIMD capabilities.
Base C++: While less efficient than the optimized version, the base implementation performs well due to its simplicity and minimal overhead.
Base Eigen: Despite Eigen's high-level abstraction and clean syntax, the Base Eigen implementation introduces additional overhead due to its general-purpose design and lack of vectorized optimizations for this specific task.

The **Optimized C++ implementation** is approximately:

5 times faster than the Base C++ implementation for 1,000,000 elements
4.7 times faster than the Base Eigen implementation for 1,000,000 elements

For smaller datasets (e.g., 1,000 elements), the difference between the implementations is less pronounced due to the relative insignificance of loop optimizations at smaller scales.

Tip: Use the Optimized C++ implementation for large datasets (10,000 elements or more) to maximize performance gains. For smaller datasets, the base implementations might be sufficient and simpler to implement.

Your Mileage May Vary: Performance results may differ based on your CPU architecture, compiler optimizations, and other system configurations. Always benchmark on your specific hardware to determine the best approach.

Considering Parallel Eigen Implementation

The Parallel Eigen Implementation can be leveraged for parallel processing on multi-core systems, particularly with large datasets. Eigen provides several ways to control threading behavior, including Eigen::setNbThreads() and Eigen::nbThreads() for programmatic control, as well as the OMP_NUM_THREADS environment variable. Since Eigen uses OpenMP internally when built with OpenMP support, these interfaces effectively serve as high-level wrappers around OpenMP threading.

Key Features:
- Automatic parallelization of specific operations like large matrix products and certain matrix decompositions (LU, QR)
- Thread-safe memory allocation control via Eigen::internal::set_is_malloc_allowed(false)
- Size-threshold-based parallelization for vector operations
Advantages:
- Effective utilization of multicore CPUs for large-scale computations
- High-level threading control through Eigen's API
- Automatic optimization of parallel execution for supported operations
Limitations:
- Parallel overhead may not justify use with smaller datasets
- Memory bandwidth often becomes the primary bottleneck rather than CPU cores
- Parallelization is only available for matrix-level operations, not element-wise computations
- Increased complexity in debugging and profiling multi-threaded code

The effectiveness of Eigen's parallel implementation depends heavily on the specific operation being performed and the dataset size. While parallel processing can significantly improve performance for large-scale computations, the threshold for effective parallelization varies by operation. For smaller datasets, single-threaded implementations often provide better performance by avoiding parallel overhead.

Example: To set the number of threads in Eigen, you can use:

Eigen::setNbThreads(4);

For an example of multithreading matrix-level operations, check out our guide: How To Do Matrix Multiplication in C++ .

Is Parallelization Suitable for Cosine Similarity?

The suitability of parallel implementation for cosine similarity calculations depends primarily on your data characteristics and computation patterns. Understanding these factors is crucial for achieving optimal performance.

Favorable Scenarios:
- Processing high-dimensional vectors (1000+ dimensions) where vectorized operations and multi-threading provide significant benefits
- Batch processing multiple similarity computations simultaneously, such as comparing one vector against a large dataset
- Integration within larger parallel processing pipelines where threading overhead is already amortized
Less Suitable Scenarios:
- Low-dimensional vectors (hundreds of elements or fewer) where threading overhead outweighs performance gains
- Single pairwise comparisons where computation times are negligible
- Memory-bandwidth limited systems where adding threads won't improve performance due to bottlenecks

When implementing parallel cosine similarity, consider that while Eigen can parallelize dot product and norm calculations using Eigen::setNbThreads(), the final division step remains scalar. This makes parallelization most effective for processing large batches of high-dimensional vectors, where vectorized operations can maximize core utilization without excessive overhead.

Use Cases

Cosine similarity is a versatile metric that finds critical applications in numerous fields. Its ability to measure similarity between high-dimensional vectors makes it indispensable for a variety of tasks:

Text Mining and Document Similarity:
In text mining, cosine similarity is a cornerstone for comparing documents in a vector space model, such as tf-idf. It helps in identifying documents with similar content, which is essential for clustering, search engines, and plagiarism detection.
Example: Pseudocode for comparing two tf-idf vectors
```
tfidf_vec1 = [2, 1, 0, 1, 3]  # tf-idf values for document 1
tfidf_vec2 = [1, 1, 1, 0, 2]  # tf-idf values for document 2
similarity = cosine_similarity(tfidf_vec1, tfidf_vec2)
        
```
Importance: Cosine similarity underpins modern information retrieval systems, enabling tasks like document clustering and similarity ranking in search engines.
Recommendation Systems:
Cosine similarity is widely used to compare user preference vectors or item feature vectors. It powers collaborative filtering techniques, helping recommend products, movies, or music based on user behavior.
Example: Pseudocode for finding similar users
```
user_pref1 = [5, 0, 3, 4, 0]  # ratings given by user 1
user_pref2 = [4, 0, 3, 5, 0]  # ratings given by user 2
similarity = cosine_similarity(user_pref1, user_pref2)
        
```
Importance: Central to e-commerce, streaming platforms, and social media, cosine similarity enables personalized recommendations that improve user experience and engagement.
Image Recognition:
In image processing, cosine similarity is used to compare feature vectors extracted from images. It helps in identifying duplicate images, finding similar patterns, and aiding in object recognition tasks.
Example: Pseudocode for comparing image feature vectors
```
image_vec1 = extract_features(image1)  # feature vector of image 1
image_vec2 = extract_features(image2)  # feature vector of image 2
similarity = cosine_similarity(image_vec1, image_vec2)
        
```
Importance: A key component of computer vision applications, cosine similarity supports tasks like reverse image search, deduplication, and facial recognition.
Natural Language Processing (NLP):
In NLP, cosine similarity measures semantic similarity between word embeddings, sentence vectors, or document vectors. It is used in tasks like sentiment analysis, chatbot responses, and question answering systems.
Example: Pseudocode for comparing word embeddings
```
embedding1 = get_word_embedding("king")
embedding2 = get_word_embedding("queen")
similarity = cosine_similarity(embedding1, embedding2)
        
```
Importance: Critical for understanding and generating human language, cosine similarity enhances search engines, recommendation systems, and language models like GPT.
Genomics and Bioinformatics:
Cosine similarity is used to compare DNA sequences, protein structures, or gene expression profiles represented as feature vectors. It aids in discovering genetic similarities and understanding biological processes.
Example: Pseudocode for comparing gene expression profiles
```
gene_profile1 = [2.5, 1.0, 0.8, 3.2]  # expression levels for gene 1
gene_profile2 = [2.7, 0.9, 1.1, 3.0]  # expression levels for gene 2
similarity = cosine_similarity(gene_profile1, gene_profile2)
        
```
Importance: Enables breakthroughs in personalized medicine, drug discovery, and evolutionary biology through comparative analysis of biological data.
Financial Analytics:
In finance, cosine similarity is used to compare time series data, stock performance vectors, or risk profiles. It helps in clustering similar stocks, detecting anomalies, and optimizing portfolios.
Example: Pseudocode for comparing stock performance
```
stock_vec1 = [1.02, 0.98, 1.05, 1.10]  # returns for stock 1
stock_vec2 = [1.01, 0.97, 1.06, 1.11]  # returns for stock 2
similarity = cosine_similarity(stock_vec1, stock_vec2)
        
```
Importance: Enhances decision-making in algorithmic trading, risk management, and market segmentation by identifying patterns and similarities.

💡 Cosine Similarity and Attention Mechanisms

The equation for scaled dot-product attention is given as: \[ \text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^\top}{\sqrt{d_k}}\right)V \] where:

\(Q\): Query matrix
\(K\): Key matrix
\(V\): Value matrix
\(d_k\): Dimensionality of the key vectors

While not explicitly calculated, cosine similarity shares a deep mathematical connection with the scaled dot-product attention mechanism used in Transformer models. Both methods fundamentally measure the alignment between vectors.

Mathematical Relationship:
- Scaled dot-product attention, with its scaling factor (\( \frac{1}{\sqrt{d_k}} \)), serves a similar purpose to cosine similarity's normalization, helping to manage gradient flow in high-dimensional spaces.
- Both methods effectively capture the directional relationships between vectors, though they achieve this through different normalization approaches.
Key Differences:
- Attention mechanisms utilize softmax normalization to convert alignment scores into a probability distribution over all keys, while cosine similarity uses L2-norm normalization to bound values between -1 and 1.
- The scaling factor in attention (\( \frac{1}{\sqrt{d_k}} \)) is specifically designed to prevent vanishing gradients in deep networks, whereas cosine similarity's normalization focuses on making magnitudes comparable.
- Attention mechanisms incorporate learnable parameters and value vectors, enabling them to learn complex patterns in sequence data beyond simple similarity measurement.

Understanding these relationships provides insight into how Transformer architectures adapt fundamental geometric concepts like vector similarity into learnable neural network operations.

⚠️ Sparse Vector Considerations

Note: High-dimensional sparse vectors are common in domains like text analysis, where vectors often have thousands or even millions of dimensions, but most of their elements are zero. While cosine similarity works well with dense vectors, directly applying it to sparse vectors can lead to inefficiencies in both memory usage and computational performance.

Memory Usage: Storing high-dimensional sparse vectors as dense arrays consumes a large amount of memory, most of which is wasted on zeros.
Computational Overhead: Computing dot products and norms for all elements, including zeros, is unnecessary and adds significant overhead.
Precision Issues: Sparse vectors can lead to numerical instability when norms are very small, especially in floating-point arithmetic.

To address these challenges, consider using libraries or data structures optimized for sparse vectors, such as:

Sparse Matrices: Libraries like Eigen and SciPy offer sparse matrix representations that efficiently store non-zero elements.
Compressed Storage Formats: Use formats like CSR (Compressed Sparse Row) or COO (Coordinate Format) to store sparse data compactly.
Specialized Algorithms: Leverage algorithms designed for sparse data to compute cosine similarity without iterating over zero elements.

By leveraging sparse representations, you can significantly reduce memory usage and speed up computations, particularly in applications like document similarity, where sparse vectors are the norm.

Conclusion

Throughout this guide, we've explored multiple approaches to implementing cosine similarity in C++, ranging from the basic implementation to an optimized SIMD version and an Eigen-based implementation. Each approach has its strengths and weaknesses, making them suitable for different use cases:

Basic Implementation: Ideal for smaller datasets or applications where simplicity and readability are key. It provides an excellent foundation for understanding cosine similarity without introducing complex optimizations.
Optimized SIMD Implementation: Best suited for large-scale applications where performance is critical. This version leverages AVX intrinsics to achieve significant speedups, especially on modern CPUs with SIMD support.
Eigen Implementations: Offers clean syntax and high-level abstraction, making it a great choice for developers already using Eigen for linear algebra operations. The parallel Eigen implementation with OpenMP can further enhance performance for massive datasets, though care must be taken to tune parallel settings appropriately.
Sparse Implementations: Crucial for domains like text mining and bioinformatics, where vectors are often high-dimensional and sparse. Specialized algorithms ensure efficient memory usage and computation by processing only the non-zero elements.

Additionally, the benchmarking section provided valuable insights into the relative performance of each implementation, helping you decide which approach aligns with your needs. Always remember to profile and benchmark on your target hardware to identify the most efficient solution.

Cosine similarity's versatility across diverse fields such as text mining, recommendation systems, and financial analytics underscores its importance in modern computational workflows. By understanding its nuances and choosing the right implementation, you can unlock significant benefits in your applications.

Congratulations on reading to the end of this tutorial! If you found it useful, please share the blog post by using the Attribution and Citation buttons below. Feel free to explore the Further Reading section for additional resources to deepen your understanding of C++.

Have fun and happy coding!

Attribution and Citation

If you found this guide and tools helpful, feel free to link back to this page or cite it in your work!

Suf

Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee

How to Calculate Cosine Similarity in C++

Table of Contents

Introduction

Mathematical Background

Basic Implementation

Step-by-Step Explanation

Example Output

Optimized Implementation

Example Output

Step-by-Step Explanation

Subsection: Understanding AVX

Checking for AVX Support

Compiling with AVX

Calculating Cosine Similarity with Eigen

Implementation

Explanation

Example Output

Advantages of Using Eigen

Performance Test: Base vs Optimized vs Eigen Cosine Similarity

Header File: BaseCosineSimilarity.h

Header File: BaseEigenCosineSimilarity.h

Header File: OptimizedCosineSimilarity.h

Benchmark Code

Results and Analysis

Results

Analysis

Considering Parallel Eigen Implementation

Is Parallelization Suitable for Cosine Similarity?

Use Cases

💡 Cosine Similarity and Attention Mechanisms

⚠️ Sparse Vector Considerations

Conclusion

Further Reading

Attribution and Citation

Suf