String Splitting in Modern C++

by | C++, Programming

Splitting strings is a common operation in many programming tasks. In this guide, we’ll explore various approaches to string splitting in Modern C++, from traditional methods to modern techniques introduced in C++17 and beyond.

📚 String Operations Quick Reference
string_view
A lightweight, non-owning reference to a string. Provides read-only access without copying, ideal for string operations like splitting.
stringstream
A stream class to operate on strings. Allows parsing strings with formatted input/output operations and common delimiter-based splitting.
substr()
A string method that creates a new string containing a portion of the original string, specified by position and length.
find()
A method that searches for a substring or character within a string, returning the position of the first occurrence or npos if not found.
getline()
A function that extracts characters from an input stream until a delimiter is found, commonly used for splitting strings by delimiter.
npos
A special value (-1) indicating “no position” or “not found” in string operations, often used as a return value for search operations.

Basic String Splitting

Let’s start with traditional approaches to string splitting in C++. These methods work across all C++ versions and are still widely used. They rely on common utilities like std::stringstream and are easy to implement.

When to Use Traditional Approaches

  • Compatible with all versions of C++.
  • Good for small datasets or when performance is not a critical factor.
  • Useful for educational purposes and understanding basic concepts.

The following example demonstrates string splitting using std::stringstream, which tokenizes a string based on a delimiter:

Traditional String Splitting with std::stringstream
#include <iostream>
#include <string>
#include <vector>
#include <sstream>

// Function to split a string into tokens
std::vector<std::string> split_string(const std::string& str, char delimiter) {
    std::vector<std::string> tokens;
    std::stringstream ss(str);
    std::string token;

    // Tokenize the string based on the delimiter
    while (std::getline(ss, token, delimiter)) {
        if (!token.empty()) {  // Skip empty tokens
            tokens.push_back(token);
        }
    }

    return tokens;
}

int main() {
    // Example string
    std::string text = "apple,banana,cherry,date";

    // Split the string into tokens
    auto fruits = split_string(text, ',');

    // Print the results
    std::cout << "Fruits:\n";
    for (const auto& fruit : fruits) {
        std::cout << "- " << fruit << '\n';
    }

    return 0;
}
  • The split_string function uses std::stringstream to tokenize a string based on the specified delimiter.
  • It iterates through the input string, extracting substrings separated by the delimiter and appending them to a vector.
  • Empty tokens (e.g., between consecutive delimiters) are skipped to ensure the output only contains meaningful data.
Fruits:
- apple
- banana
- cherry
- date

Below is an alternative implementation using std::find and manual iteration for improved control:

String Splitting with Manual Iteration
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>

// Function to split a string using manual iteration
std::vector<std::string> split_string_manual(const std::string& str, char delimiter) {
    std::vector<std::string> tokens;
    size_t start = 0;
    size_t end;

    // Iterate through the string and find delimiters
    while ((end = str.find(delimiter, start)) != std::string::npos) {
        if (start != end) {  // Avoid empty tokens
            tokens.push_back(str.substr(start, end - start));
        }
        start = end + 1;
    }
    if (start < str.size()) {
        tokens.push_back(str.substr(start));  // Add the last token
    }

    return tokens;
}

int main() {
    // Example string
    std::string text = "dog,cat,bird,fish";

    // Split the string into tokens
    auto animals = split_string_manual(text, ',');

    // Print the results
    std::cout << "Animals:\n";
    for (const auto& animal : animals) {
        std::cout << "- " << animal << '\n';
    }

    return 0;
}
  • The split_string_manual function uses std::find to locate delimiters and std::substr to extract tokens.
  • It provides more control over the tokenization process, allowing the handling of edge cases like trailing delimiters or empty input strings.
  • This approach avoids using std::stringstream, making it a more direct and efficient solution for simple tokenization tasks.
Animals:
- dog
- cat
- bird
- fish

Advantages of Manual Iteration

  • Provides more control over token processing.
  • Can handle edge cases like trailing delimiters effectively.
  • May offer better performance for simple tokenization tasks.

Modern Approaches (C++17)

C++17 introduced std::string_view, a lightweight, non-owning reference to a sequence of characters. This feature allows efficient string manipulation without additional memory allocations, making it ideal for string splitting and other operations.

When to Use string_view

  • When you need efficient, zero-copy operations.
  • When the original string remains in scope and won't be modified.
  • For performance-critical applications where memory allocation needs to be minimized.

Here's a modern implementation of string splitting using std::string_view:

Modern String Splitting with std::string_view

#include <iostream>        // For input and output
#include <string_view>     // For std::string_view
#include <vector>          // For std::vector to store the results

// Function to split a string_view into tokens based on a delimiter
std::vector<std::string_view> split_view(std::string_view str, char delimiter) {
    std::vector<std::string_view> tokens;  // Vector to hold the split tokens
    size_t start = 0;                      // Starting index of the current token

    // Iterate through the string to find delimiters
    while (start < str.size()) {
        const auto end = str.find(delimiter, start); // Find the next delimiter
        if (start != end) {
            // Add the substring (view) from start to the delimiter
            tokens.emplace_back(str.substr(start, end - start));
        }
        if (end == std::string_view::npos) break;   // Exit loop if no more delimiters are found
        start = end + 1;                            // Move the start to the character after the delimiter
    }

    return tokens;  // Return the vector of tokens
}

int main() {
    std::string text = "red:green:blue:yellow"; // Input string to be split
    auto colors = split_view(text, ':');       // Split the string using ':' as the delimiter

    std::cout << "Colors:\n";                  // Print the header
    for (const auto& color : colors) {
        std::cout << "- " << color << '\n';    // Print each token
    }

    return 0; // Indicate successful execution
}
    
  • The split_view function uses std::string_view to tokenize a string without creating new string objects, minimizing memory allocation.
  • It iterates through the string using find to locate delimiters and creates a view of each token using substr.
  • This approach is highly efficient for read-only string operations.
Colors:
- red
- green
- blue
- yellow

To further optimize this approach, consider reusing pre-allocated buffers for tokens:

Optimized String Splitting with Pre-Allocated Buffers
#include <iostream>         // For input and output
#include <string_view>      // For std::string_view
#include <vector>           // For std::vector to store the results
#include <algorithm>        // For std::count to estimate the number of tokens

// Optimized function for splitting strings into pre-allocated buffers
void split_view_into(std::string_view str, char delimiter, std::vector<std::string_view>& output) {
    output.clear(); // Clear the output buffer to reuse it
    size_t start = 0; // Starting index of the current token

    // Reserve approximate space to minimize reallocations
    output.reserve(std::count(str.begin(), str.end(), delimiter) + 1);

    // Iterate through the string to find delimiters
    while (start < str.size()) {
        const auto end = str.find(delimiter, start); // Find the next delimiter
        if (start != end) {
            // Add the substring (view) from start to the delimiter
            output.emplace_back(str.substr(start, end - start));
        }
        if (end == std::string_view::npos) break; // Exit loop if no more delimiters are found
        start = end + 1; // Move the start to the character after the delimiter
    }
}

int main() {
    std::string text = "apple:banana:cherry:date"; // Input string to be split
    std::vector<std::string_view> fruits;    // Pre-allocated buffer for results

    // Split string into pre-allocated buffer
    split_view_into(text, ':', fruits);

    std::cout << "Fruits:\n"; // Print the header
    for (const auto& fruit : fruits) {
        std::cout << "- " << fruit << '\n'; // Print each token
    }

    return 0; // Indicate successful execution
}
    
  • The split_view_into function takes a pre-allocated buffer as an argument, clearing and reusing it for each split operation.
  • By reserving the buffer size based on the number of delimiters, it minimizes the need for dynamic memory allocation.
  • Ideal for scenarios involving frequent or large-scale string-splitting tasks.
Fruits:
- apple
- banana
- cherry
- date

Benefits of Using Pre-Allocated Buffers

  • Zero Copy: Using std::string_view avoids creating new strings for substrings, reducing memory overhead.
  • Pre-reserved Capacity: Pre-reserving the vector's capacity minimizes the cost of dynamic resizing.
  • Reduced Fragmentation: By reusing the vector for subsequent calls, memory fragmentation is minimized.

Important Considerations

  • Ensure the original string remains in scope as std::string_view does not own the data.
  • Handle edge cases like trailing delimiters and empty strings.
  • Be cautious of modifying the original string while using std::string_view.

Key Performance Tips

  • Reuse pre-allocated buffers to minimize memory allocations.
  • Use std::string_view for zero-copy string operations to avoid unnecessary allocations.
  • Pre-reserve the capacity of the vector to reduce dynamic resizing.
  • Prefer custom memory allocators for applications with stringent memory requirements.

Best Practices

Guidelines for String Splitting

  • Use std::string_view for modern C++ projects: Whenever possible, prefer std::string_view for efficient, non-owning string manipulation. This reduces memory allocations and improves performance.
  • Consider the lifetime of split strings: Ensure that the source string remains in scope and unmodified while using std::string_view, as it does not own the string data.
  • Handle empty tokens appropriately: Decide whether to include or exclude empty tokens resulting from consecutive delimiters. This can depend on the specific requirements of your application.
  • Choose between copying and views based on needs: For short-lived operations, std::string_view is ideal. For long-term storage, copying tokens to a new std::string may be safer.

Conclusion

Modern C++ provides versatile tools for string splitting, from traditional std::stringstream methods to the efficient std::string_view. Choose the approach that suits your needs based on performance, string lifetime management, and compatibility.

Congratulations on reading to the end of this tutorial! For further exploration of C++ programming concepts and advanced techniques, check out the resources below.

Have fun and happy coding!

Further Reading

  • Online C++ Compiler

    Practice the code examples from this guide in a free, interactive environment. This compiler allows you to experiment with string splitting techniques and other C++ features without installing any software locally. Try modifying our examples to see how different approaches affect performance and readability.

  • C++ String Library Reference

    Comprehensive documentation about the C++ string library, including detailed information about string manipulation functions, iterators, and modern string features like string_view.

  • C++ String View Reference

    Deep dive into string_view, the modern C++ feature for efficient string operations. Learn about its API, performance characteristics, and best practices for usage.

  • Abseil's String View Guide

    Google's detailed guide on string_view usage, including real-world examples and performance considerations from their extensive experience with large-scale C++ codebases.

Attribution and Citation

If you found this guide helpful, feel free to link back to this page or cite it in your work!

Profile Picture
Senior Advisor, Data Science | [email protected] |  + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨