NAVIGATE

History

Who: Developed by Intel. Now open-source as oneTBB under the oneAPI umbrella.
Why: To provide high-level parallel programming abstractions for C++ that automatically scale to available CPU cores — without manual thread management.
When: First released in 2006. Open-sourced in 2016. Rebranded as oneTBB in 2020.

Introduction

What is TBB?

A C++ template library for task-based parallelism that abstracts thread management.
Automatically distributes work across available CPU cores using a work-stealing scheduler.
GitHub: oneapi-src/oneTBB

Advantages

High-level API — no manual thread creation or synchronization for most use cases.
Work-stealing scheduler maximizes CPU utilization.
Concurrent containers (queue, hash map, vector) for thread-safe data structures.
Composable — parallel algorithms can be nested.
Scales automatically to available cores.

Disadvantages

Overhead for very small tasks (use only when work is substantial).
Learning curve for flow graphs and advanced features.
Requires linking against TBB library.

Installation & Setup

apt (Ubuntu)

sudo apt install libtbb-dev

vcpkg

vcpkg install tbb

CMake

find_package(TBB REQUIRED)
target_link_libraries(MyApp TBB::tbb)

Include

#include <tbb/tbb.h>
// or specific headers:
#include <tbb/parallel_for.h>
#include <tbb/parallel_reduce.h>
#include <tbb/concurrent_queue.h>

Core Concepts

parallel_for — Parallel Loop

#include <tbb/parallel_for.h>
#include <tbb/blocked_range.h>
#include <vector>
 
std::vector<double> data(1000000, 1.0);
 
// Parallel loop over range [0, data.size())
tbb::parallel_for(
    tbb::blocked_range<size_t>(0, data.size()),
    [&](const tbb::blocked_range<size_t>& r) {
        for (size_t i = r.begin(); i < r.end(); ++i) {
            data[i] = data[i] * 2.0 + 1.0; // heavy computation
        }
    }
);
 
// Simple index-based parallel_for (TBB 2020+)
tbb::parallel_for(size_t(0), data.size(), [&](size_t i) {
    data[i] *= 2.0;
});

parallel_reduce — Parallel Reduction

#include <tbb/parallel_reduce.h>
#include <tbb/blocked_range.h>
 
std::vector<double> v(1000000);
// fill v...
 
// Sum all elements in parallel
double total = tbb::parallel_reduce(
    tbb::blocked_range<size_t>(0, v.size()),
    0.0,  // identity value
    [&](const tbb::blocked_range<size_t>& r, double init) {
        for (size_t i = r.begin(); i < r.end(); ++i)
            init += v[i];
        return init;
    },
    std::plus<double>()  // combine partial results
);
 
std::cout << "Sum: " << total;

parallel_invoke — Run Tasks Concurrently

#include <tbb/parallel_invoke.h>
 
void taskA() { /* heavy work */ }
void taskB() { /* heavy work */ }
void taskC() { /* heavy work */ }
 
// Run all three concurrently
tbb::parallel_invoke(taskA, taskB, taskC);
// All three complete before continuing

parallel_sort

#include <tbb/parallel_sort.h>
#include <vector>
 
std::vector<int> v = {5, 3, 1, 4, 2, 8, 7, 6};
 
tbb::parallel_sort(v.begin(), v.end());
// v = {1, 2, 3, 4, 5, 6, 7, 8}
 
// Custom comparator
tbb::parallel_sort(v.begin(), v.end(), std::greater<int>());
// v = {8, 7, 6, 5, 4, 3, 2, 1}

Concurrent Containers

concurrent_queue

#include <tbb/concurrent_queue.h>
 
tbb::concurrent_queue<int> queue;
 
// Producer (thread-safe push)
queue.push(42);
queue.push(100);
 
// Consumer (thread-safe pop)
int val;
if (queue.try_pop(val)) {
    std::cout << "Got: " << val;
}
 
std::cout << "Size: " << queue.unsafe_size();

concurrent_hash_map

#include <tbb/concurrent_hash_map.h>
 
tbb::concurrent_hash_map<std::string, int> map;
 
// Insert
{
    tbb::concurrent_hash_map<std::string, int>::accessor acc;
    map.insert(acc, "key");
    acc->second = 42;
}
 
// Read
{
    tbb::concurrent_hash_map<std::string, int>::const_accessor acc;
    if (map.find(acc, "key")) {
        std::cout << acc->second; // 42
    }
}

concurrent_vector

#include <tbb/concurrent_vector.h>
 
tbb::concurrent_vector<int> cv;
 
// Thread-safe push_back
tbb::parallel_for(0, 1000, [&](int i) {
    cv.push_back(i);
});
 
std::cout << "Size: " << cv.size(); // 1000

Task Groups

#include <tbb/task_group.h>
 
tbb::task_group tg;
 
tg.run([] { std::cout << "Task 1\n"; });
tg.run([] { std::cout << "Task 2\n"; });
tg.run([] { std::cout << "Task 3\n"; });
 
tg.wait(); // wait for all tasks to complete

Table of Contents

Explorer

TBB (Threading Building Blocks)

History

Introduction

What is TBB?

Advantages

Disadvantages

Installation & Setup

apt (Ubuntu)

vcpkg

CMake

Include

Core Concepts

parallel_for — Parallel Loop

parallel_reduce — Parallel Reduction

parallel_invoke — Run Tasks Concurrently

parallel_sort

Concurrent Containers

concurrent_queue

concurrent_hash_map

concurrent_vector

Task Groups

More Learn

Enjoying the Notes?

Graph View

Backlinks

Recently Updated