Who: Developed by Intel. Now open-source as oneTBB under the oneAPI umbrella.
Why: To provide high-level parallel programming abstractions for C++ that automatically scale to available CPU cores — without manual thread management.
When: First released in 2006. Open-sourced in 2016. Rebranded as oneTBB in 2020.
Introduction
What is TBB?
A C++ template library for task-based parallelism that abstracts thread management.
Automatically distributes work across available CPU cores using a work-stealing scheduler.
#include <tbb/tbb.h>// or specific headers:#include <tbb/parallel_for.h>#include <tbb/parallel_reduce.h>#include <tbb/concurrent_queue.h>
Core Concepts
parallel_for — Parallel Loop
#include <tbb/parallel_for.h>#include <tbb/blocked_range.h>#include <vector>std::vector<double> data(1000000, 1.0);// Parallel loop over range [0, data.size())tbb::parallel_for( tbb::blocked_range<size_t>(0, data.size()), [&](const tbb::blocked_range<size_t>& r) { for (size_t i = r.begin(); i < r.end(); ++i) { data[i] = data[i] * 2.0 + 1.0; // heavy computation } });// Simple index-based parallel_for (TBB 2020+)tbb::parallel_for(size_t(0), data.size(), [&](size_t i) { data[i] *= 2.0;});
parallel_reduce — Parallel Reduction
#include <tbb/parallel_reduce.h>#include <tbb/blocked_range.h>std::vector<double> v(1000000);// fill v...// Sum all elements in paralleldouble total = tbb::parallel_reduce( tbb::blocked_range<size_t>(0, v.size()), 0.0, // identity value [&](const tbb::blocked_range<size_t>& r, double init) { for (size_t i = r.begin(); i < r.end(); ++i) init += v[i]; return init; }, std::plus<double>() // combine partial results);std::cout << "Sum: " << total;
parallel_invoke — Run Tasks Concurrently
#include <tbb/parallel_invoke.h>void taskA() { /* heavy work */ }void taskB() { /* heavy work */ }void taskC() { /* heavy work */ }// Run all three concurrentlytbb::parallel_invoke(taskA, taskB, taskC);// All three complete before continuing