How to Use HashMap in C++

by Suf | C++, Programming

C++’s unordered_map (commonly known as HashMap in other languages) is a powerful container that stores key-value pairs using hash tables. It provides average constant-time complexity for insertions, deletions, and lookups, making it an essential tool for efficient data management in C++ programming.

Introduction
Basic Usage
Advanced Operations
Understanding the Bucket System
Understanding the Load Factor Behaviour
Using Custom Objects as Keys
Best Practices

Introduction

At its core, an unordered_map works by using a hash function to convert keys into array indices, allowing for rapid data access. This approach significantly improves performance compared to searching through data sequentially, especially when working with large datasets.

Key features of unordered_map include:

O(1) average time complexity for basic operations, making it ideal for performance-critical applications
Automatic memory management that handles dynamic resizing as elements are added or removed
Support for any data type as key or value, providing flexibility in data organization
Built-in hash function support for standard types, with the ability to customize for user-defined types

Understanding how to effectively use unordered_map is crucial for any C++ developer, as it provides solutions for many common programming challenges, from caching to counting unique elements to implementing fast lookup tables.

Basic Usage

Let’s begin with fundamental operations that form the foundation of working with unordered_map. These basic operations demonstrate the container’s intuitive interface and its similarity to regular associative arrays or dictionaries in other programming languages.

The following example shows the most common operations you’ll perform with an unordered_map. We’ll create a simple program that manages age records, demonstrating different ways to insert, access, and iterate through data:

Basic HashMap Operations

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    // Create an unordered_map
    std::unordered_map<std::string, int> ages;

    // Insert elements
    ages["Alice"] = 25;
    ages.insert({"Bob", 30});
    ages.emplace("Charlie", 35);

    // Access elements
    std::cout << "Alice's age: " << ages["Alice"] << std::endl;

    // Check if key exists
    if (ages.count("Bob") > 0) {
        std::cout << "Bob's age: " << ages["Bob"] << std::endl;
    }

    // Iterate through the map
    for (const auto& [name, age] : ages) {
        std::cout << name << " is " << age << " years old\n";
    }

    return 0;
}

Alice's age: 25
Bob's age: 30
Alice is 25 years old
Bob is 30 years old
Charlie is 35 years old

Let's break down what's happening in this example:

The code demonstrates three different ways to insert elements into an unordered_map. The square bracket operator (ages["Alice"] = 25) is the most straightforward but creates a default element if the key doesn't exist. The insert() method is more explicit and allows you to add key-value pairs as a unit. Finally, emplace() constructs the element in place, which can be more efficient when working with complex objects.

When it comes to accessing elements, we showcase both direct access using the square bracket operator and safe checking using the count() method. The latter is particularly important when you're not sure if a key exists in the map.

Advanced Operations

Once you're comfortable with basic operations, unordered_map offers several advanced features that provide more control and efficiency. These operations are essential for building robust and performant applications.

In this section, we'll explore more sophisticated techniques, including safe element access, performance monitoring, and proper element removal. These operations are particularly valuable when working with larger datasets or when performance is critical:

Advanced HashMap Features

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<std::string, int> scores;

    // Insert multiple elements
    scores = {
        {"Math", 95},
        {"Science", 88},
        {"History", 92}
    };

    // Using find() instead of operator[]
    auto it = scores.find("Math");
    if (it != scores.end()) {
        std::cout << "Found Math score: " << it->second << "\n";
    }

    // Safe way to check and get value
    if (auto [iter, inserted] = scores.try_emplace("English", 90); inserted) {
        std::cout << "Added English score\n";
    }

    // Get or create with default value
    int art_score = scores.try_emplace("Art", 85).first->second;
    std::cout << "Art score: " << art_score << "\n";

    // Remove an element
    scores.erase("History");

    // Get some statistics
    std::cout << "Number of subjects: " << scores.size() << "\n";
    std::cout << "Bucket count: " << scores.bucket_count() << "\n";
    std::cout << "Load factor: " << scores.load_factor() << "\n";

    return 0;
}

Output:
Found Math score: 95
Added English score
Art score: 85
Number of subjects: 4
Bucket count: 7
Load factor: 0.571429

This example introduces several important concepts. The find() method provides a safer alternative to operator[] when you need to check for existence without modifying the map. The try_emplace() function offers an atomic way to insert elements only if they don't already exist, which is perfect for initialization patterns.

The code also demonstrates how to monitor the container's internal state through methods like bucket_count() and load_factor(). Understanding these metrics is crucial for optimizing performance, as they directly affect how the hash table behaves under different loads.

Understanding the Bucket System

At its core, unordered_map's performance depends heavily on its bucket system. Think of buckets as internal compartments where the container stores your data. The bucket_count() method reveals how many of these compartments exist, while load_factor() shows how efficiently they're being used.

The bucket system works through several key mechanisms:

When you insert a key-value pair, the hash function determines which bucket will store it
Multiple items may share the same bucket (called collisions), but too many collisions slow down lookups
The container automatically increases bucket count when the load factor exceeds its maximum (typically 1.0)
More buckets mean fewer collisions but higher memory usage—it's a classic space-time tradeoff

You can optimize bucket usage for your specific needs. If you know you'll need space for 1000 items, calling reserve(1000) preemptively allocates appropriate bucket space, preventing expensive rehashing operations as your container grows. Similarly, monitoring load_factor() helps you understand whether your container might benefit from manual adjustments to its bucket count.

Here's a practical example of monitoring bucket metrics:

Bucket Monitoring Example

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<std::string, int> map;

    // Check initial state
    std::cout << "Initial bucket count: " << map.bucket_count() << "\n";
    std::cout << "Initial load factor: " << map.load_factor() << "\n";

    // Reserve space for expected data
    map.reserve(1000);  // Adjusts bucket count appropriately

    std::cout << "After reserve bucket count: " << map.bucket_count() << "\n";
    std::cout << "After reserve load factor: " << map.load_factor() << "\n";

    // Add some items and monitor changes
    for(int i = 0; i < 100; i++) {
        map["key" + std::to_string(i)] = i;
    }

    // Check final metrics
    std::cout << "Final bucket count: " << map.bucket_count() << "\n";
    std::cout << "Final load factor: " << map.load_factor() << "\n";
    std::cout << "Max load factor: " << map.max_load_factor() << "\n";

    return 0;
}

Output:
Initial bucket count: 1
Initial load factor: 0
After reserve bucket count: 1031
After reserve load factor: 0
Final bucket count: 1031
Final load factor: 0.0969932
Max load factor: 1

The initial state shows us that unordered_map starts very small:

It begins with just one bucket to conserve memory
The load factor is 0 because we haven't added any elements yet

When we call reserve(1000), the container prepares space for 1000 elements:

The bucket count jumps to 1031 (slightly more than 1000)
This extra space (1031 vs 1000) helps maintain good performance by keeping some buckets free
The load factor remains 0 because we still haven't added any elements

After adding 100 elements, we can see the final metrics:

The bucket count stays at 1031 because we haven't exceeded the capacity we reserved
The load factor is now about 0.097 (100 elements ÷ 1031 buckets)
This low load factor (well below the maximum of 1.0) indicates we have plenty of space

The max load factor of 1.0 is the container's threshold for rehashing. If our load factor approached 1.0 (meaning nearly as many elements as buckets), the container would automatically increase its bucket count to maintain performance. In this case, with a load factor of only about 0.097, we're using less than 10% of our capacity, ensuring fast lookups but potentially using more memory than necessary if we don't expect to add many more elements.

Understanding Load Factor Behaviour

When an unordered_map's load factor approaches or exceeds its maximum, the container automatically rehashes itself by increasing its bucket count. Let's observe this behavior in action:

Load Factor Exceeding Maximum Example

#include <iostream>
#include <unordered_map>
#include <iomanip>  // For std::fixed and std::setprecision

int main() {
    // Create a map with a small initial size
    std::unordered_map<int, int> map;

    // Set a lower max load factor to make rehashing more obvious
    map.max_load_factor(0.2);  // Default is 1.0

    std::cout << std::fixed << std::setprecision(3);

    // Add elements and monitor the changes
    for(int i = 1; i <= 20; i++) {
        map[i] = i;

        std::cout << "After inserting " << i << ":\n"
                  << "  Bucket count: " << map.bucket_count()
                  << "\n  Current load factor: " << map.load_factor()
                  << "\n  Max load factor: " << map.max_load_factor() << "\n\n";
    }

    return 0;
}

Output:
After inserting 1:
Bucket count: 59
Current load factor: 0.017
Max load factor: 0.200

After inserting 2:
Bucket count: 59
Current load factor: 0.034
Max load factor: 0.200

After inserting 3:
Bucket count: 59
Current load factor: 0.051
Max load factor: 0.200

...

After inserting 10:
Bucket count: 59
Current load factor: 0.169
Max load factor: 0.200

After inserting 11:
Bucket count: 59
Current load factor: 0.186
Max load factor: 0.200

After inserting 12:
Bucket count: 127
Current load factor: 0.094
Max load factor: 0.200

After inserting 13:
Bucket count: 127
Current load factor: 0.102
Max load factor: 0.200

After inserting 14:
Bucket count: 127
Current load factor: 0.110
Max load factor: 0.200

....

After inserting 19:
Bucket count: 127
Current load factor: 0.150
Max load factor: 0.200

After inserting 20:
Bucket count: 127
Current load factor: 0.157
Max load factor: 0.200

Our test with max_load_factor(0.2) reveals three distinct phases of unordered_map's bucket management:

Initial Bucket Management (Elements 1-11)

Starts with 59 buckets for efficient hash distribution
Load factor grows steadily from 0.017 to 0.186
Maintains same bucket count while under max_load_factor

Critical Rehashing Point (Element 12)

Adding 12th element would exceed max_load_factor of 0.2
Container doubles bucket count from 59 to 127
Load factor drops sharply to 0.094 after rehash

Stable Growth Phase (Elements 13-20)

Load factor climbs gradually from 0.102 to 0.157
Bucket count remains stable at 127
No further rehashing needed as load factor stays under 0.2

This pattern demonstrates unordered_map's strategy of maintaining performance through strategic bucket allocation and timely rehashing operations.

While rehashing maintains good lookup performance, it can be computationally expensive. That's why it's often beneficial to reserve space in advance if you know approximately how many elements you'll be storing. This prevents multiple rehashing operations as the container grows.

Using Custom Objects as Keys

One of the most powerful features of unordered_map is its ability to work with user-defined types as keys. However, this flexibility comes with the responsibility of properly implementing hash and equality operations for your custom types.

To successfully use a custom type as a key, you need to provide two essential components: an equality comparison operator (operator==) and a hash function. The equality operator tells the container when two keys should be considered the same, while the hash function converts your key into a numerical value for efficient storage and lookup.

Custom Object as HashMap Key

#include <iostream>
#include <unordered_map>

struct Point {
    int x, y;

    // Required: equality operator
    bool operator==(const Point& other) const {
        return x == other.x && y == other.y;
    }
};

// Define hash function
namespace std {
    template <>
    struct hash<Point> {
        size_t operator()(const Point& p) const {
            return hash<int>()(p.x) ^ (hash<int>()(p.y) << 1);
        }
    };
}

int main() {
    std::unordered_map<Point, std::string> pointLabels;

    // Insert elements
    pointLabels[{1, 2}] = "Point A";
    pointLabels[{3, 4}] = "Point B";

    // Access elements
    Point p{1, 2};
    std::cout << "Label for (1,2): " << pointLabels[p] << std::endl;

    return 0;
}

Output:
Label for (1,2): Point A

This example introduces a Point structure as a key type. The critical components here are:

The equality operator (operator==) defines when two Points should be considered equal. In this case, we consider points equal when both their x and y coordinates match. The hash function, implemented within the std namespace, generates a unique numerical value for each point. Our implementation combines the hash values of the x and y coordinates using the XOR operation and bit shifting, which helps distribute the values evenly across the hash table.

When implementing hash functions for your own types, it's important to ensure that equal objects produce equal hash values, and that the hash values are well-distributed to maintain good performance.

Best Practices

Success with unordered_map goes beyond just knowing its features. Following established best practices helps you avoid common pitfalls and write more efficient, maintainable code. These practices have emerged from real-world experience and performance analysis.

When working with unordered_map, keep these best practices in mind:

Use find() instead of operator[] when you only need to check existence
Consider using emplace() or try_emplace() instead of insert() for better performance
Reserve space in advance if you know the approximate number of elements
Be careful with custom hash functions - they should distribute values uniformly
Use structured bindings (C++17) for cleaner iteration syntax

Let's examine these practices in action with a practical example that demonstrates how to implement them effectively:

Best Practices Example

#include <iostream>
#include <unordered_map>
#include <string>

int main() {
    std::unordered_map<std::string, int> cache;

    // Reserve space for expected elements
    cache.reserve(1000);

    // Prefer try_emplace over operator[]
    cache.try_emplace("key1", 100);

    // Use structured bindings for iteration
    for (const auto& [key, value] : cache) {
        std::cout << key << ": " << value << '\n';
    }

    // Safe access pattern
    if (auto it = cache.find("key1"); it != cache.end()) {
        std::cout << "Found value: " << it->second << '\n';
    }

    return 0;
}

Output:
key1: 100
Found value: 100

This example demonstrates several key best practices in action. Notice how we reserve space upfront to avoid unnecessary rehashing operations, use try_emplace for efficient insertion, and implement safe key checking patterns. The structured bindings syntax (introduced in C++17) makes the code more readable and less prone to errors when working with key-value pairs.

Remember that these practices aren't just about writing correct code—they're about writing code that performs well and is maintainable in the long term. The choice between different methods like operator[] versus find() can have significant implications for both performance and code reliability.

Conclusion

Throughout this guide, we've explored how C++'s unordered_map combines powerful functionality with efficient performance through its hash table implementation. From basic key-value storage to advanced custom object mapping, this container provides essential tools for modern C++ development. The best practices we've covered—using try_emplace, preferring find() for lookups, and implementing proper hash functions—form the foundation for writing efficient and maintainable code.

As you implement unordered_map in your own projects, remember that its true power lies in the balance between its ease of use and performance capabilities. Whether building cache systems or managing complex data relationships, the principles covered here will help you make the most of this versatile container.

Attribution and Citation

If you found this guide and tools helpful, feel free to link back to this page or cite it in your work!

Suf

Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee