DirectX uses COM (Component Object Model) interfaces. Every D3D12 object is a COM interface (ID3D12Something). You must use Microsoft::WRL::ComPtr<T> instead of raw pointers — it auto-releases when it goes out of scope (like shared_ptr for COM objects).
#include <d3d12.h>#include <dxgi1_6.h>#include <d3dcompiler.h>#include <wrl/client.h> // ComPtr<>#include <DirectXMath.h> // XMMATRIX, XMFLOAT3, etc.using namespace Microsoft::WRL;using namespace DirectX;// Link libraries (in Visual Studio project settings or CMake)// d3d12.lib dxgi.lib d3dcompiler.lib dxguid.lib// ComPtr usage - auto-releases COM object when destroyedComPtr<ID3D12Device> device;// device.Get() → raw pointer (for API calls)// device.GetAddressOf() → &device (for creation functions)// device.Reset() → explicit release
Enabling the Debug Layer
The D3D12 Debug Layer validates every API call and catches mistakes. Always enable it in debug builds.
DXGI (DirectX Graphics Infrastructure) is the layer between DirectX and the GPU hardware. It handles adapter enumeration, swapchain creation, and display management. DXGI is separate from D3D12 — it works across DX11, DX12, and even Vulkan (on Windows via DXVK).
// Create DXGI Factory (required for everything DXGI)UINT dxgiFactoryFlags = 0;#if defined(_DEBUG)dxgiFactoryFlags |= DXGI_CREATE_FACTORY_DEBUG;#endifComPtr<IDXGIFactory6> factory;CreateDXGIFactory2(dxgiFactoryFlags, IID_PPV_ARGS(&factory));// Enumerate adapters by performance preference (discrete GPU first)ComPtr<IDXGIAdapter4> adapter;for (UINT adapterIndex = 0; SUCCEEDED(factory->EnumAdapterByGpuPreference( adapterIndex, DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE, IID_PPV_ARGS(&adapter))); ++adapterIndex){ DXGI_ADAPTER_DESC3 desc; adapter->GetDesc3(&desc); // Skip the software rasterizer (WARP) if (desc.Flags & DXGI_ADAPTER_FLAG3_SOFTWARE) continue; // Check if it supports D3D12 if (SUCCEEDED(D3D12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_12_0, _uuidof(ID3D12Device), nullptr))) { break; // Found our GPU }}
3 — Creating the D3D12 Device
Feature Levels
Feature Level
GPU Requirement
Features
D3D_FEATURE_LEVEL_11_0
Very old GPUs
SM 5.0, basic compute
D3D_FEATURE_LEVEL_12_0
Modern discrete GPUs
Tier 1 resource binding, VP, DXR optional
D3D_FEATURE_LEVEL_12_1
NVIDIA Maxwell+ / AMD GCN+
Tier 2 resource binding
D3D_FEATURE_LEVEL_12_2
NVIDIA Ampere+ / AMD RDNA2+
DXR Tier 1.1, Mesh Shaders, VRS
Creating the Device
ComPtr<ID3D12Device8> device;HRESULT hr = D3D12CreateDevice( adapter.Get(), // Specific adapter to use D3D_FEATURE_LEVEL_12_0, // Minimum feature level IID_PPV_ARGS(&device));if (FAILED(hr)) throw std::runtime_error("Failed to create D3D12 device!");// Configure debug breaks (in debug mode)#if defined(_DEBUG)ComPtr<ID3D12InfoQueue> infoQueue;device->QueryInterface(IID_PPV_ARGS(&infoQueue));infoQueue->SetBreakOnSeverity(D3D12_MESSAGE_SEVERITY_CORRUPTION, TRUE);infoQueue->SetBreakOnSeverity(D3D12_MESSAGE_SEVERITY_ERROR, TRUE);#endif// ---- Check what optional features are available ----D3D12_FEATURE_DATA_D3D12_OPTIONS5 options5{};device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &options5, sizeof(options5));bool hasRayTracing = (options5.RaytracingTier >= D3D12_RAYTRACING_TIER_1_0);D3D12_FEATURE_DATA_SHADER_MODEL shaderModel{ D3D_SHADER_MODEL_6_6 };device->CheckFeatureSupport(D3D12_FEATURE_SHADER_MODEL, &shaderModel, sizeof(shaderModel));
4 — Command Queue, Allocator, and List
The Three-Part Command System
graph TD
CA["ID3D12CommandAllocator\nAllocates raw memory for command storage\nOne per frame-in-flight per thread"]
CL["ID3D12GraphicsCommandList\nYou record commands here (draw, barrier, copy)\nReused every frame (reset before recording)"]
CQ["ID3D12CommandQueue\nYou submit closed command lists here\nGPU executes from here asynchronously"]
CA -->|"commandList->Reset(allocator)"| CL
CL -->|"commandList->Close()"| CQ
CQ -->|"commandQueue->ExecuteCommandLists()"| GPU["GPU: executes async"]
Creating Each Component
// ---- Create the Command QUEUE ----D3D12_COMMAND_QUEUE_DESC queueDesc{};queueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; // Graphics + Compute + CopyqueueDesc.Priority = D3D12_COMMAND_QUEUE_PRIORITY_NORMAL;queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;queueDesc.NodeMask = 0; // Single-GPU: always 0ComPtr<ID3D12CommandQueue> commandQueue;device->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&commandQueue));// ---- Create Command ALLOCATORS (one per frame-in-flight) ----const UINT numFrames = 2;ComPtr<ID3D12CommandAllocator> commandAllocators[numFrames];for (UINT i = 0; i < numFrames; i++) { device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocators[i]));}// ---- Create the Command LIST ----ComPtr<ID3D12GraphicsCommandList6> commandList;device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocators[0].Get(), nullptr, // No initial PSO IID_PPV_ARGS(&commandList));commandList->Close(); // Must be closed before reset
Resource barriers are D3D12’s way of telling the GPU: “The resource’s usage is changing.”
Without a barrier, the GPU doesn’t know to flush its caches or wait for dependent passes to finish.
Resource State
How it’s Used
D3D12_RESOURCE_STATE_PRESENT
On screen — about to be displayed
D3D12_RESOURCE_STATE_RENDER_TARGET
Being drawn to (color output)
D3D12_RESOURCE_STATE_DEPTH_WRITE
Depth buffer is being written
D3D12_RESOURCE_STATE_DEPTH_READ
Depth buffer read-only (e.g., in shadow map sampling)
D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE
Being sampled in pixel shader
D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE
Read in vertex / compute shader
D3D12_RESOURCE_STATE_UNORDERED_ACCESS
Read + write in compute shader (UAV)
D3D12_RESOURCE_STATE_COPY_SOURCE
Source for a GPU copy
D3D12_RESOURCE_STATE_COPY_DEST
Destination for a GPU copy
D3D12_RESOURCE_STATE_GENERIC_READ
Any read-only access (upload heaps only)
// Helper: create a transition barrierD3D12_RESOURCE_BARRIER TransitionBarrier(ID3D12Resource* resource, D3D12_RESOURCE_STATES before, D3D12_RESOURCE_STATES after) { D3D12_RESOURCE_BARRIER barrier{}; barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION; barrier.Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE; barrier.Transition.pResource = resource; barrier.Transition.StateBefore = before; barrier.Transition.StateAfter = after; barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES; return barrier;}// Example: transition back buffer from Present to Render Target at start of frameauto barrier = TransitionBarrier(renderTargets[frameIndex].Get(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET);commandList->ResourceBarrier(1, &barrier);// ... drawing ...// Transition back from Render Target to Present at end of frameauto barrierBack = TransitionBarrier(renderTargets[frameIndex].Get(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT);commandList->ResourceBarrier(1, &barrierBack);
8 — Root Signatures
What Is a Root Signature?
The Root Signature is the contract between your C++ code and your HLSL shaders. It defines exactly what types of data are available to the shaders and how they access it. Think of it as the function signature of your shader’s “API”.
Every SetGraphicsRoot*(...) call you make maps to an entry defined here.
graph TD
RS["Root Signature\n(The Contract / Function Signature)"]
RC["Root Constants\n≤ 12 DWORDs, fastest — directly in registers\nUse for: object index, time, flags"]
RD["Root Descriptors\nCBV/SRV/UAV address directly in root\nNo indirection, fast — use for per-draw data"]
DT["Descriptor Tables\nPointer to a range in a descriptor heap\nSlower but allows huge arrays of textures"]
RS --> RC
RS --> RD
RS --> DT
// Old way: use d3dcompiler.lib (still works, ships with Windows SDK)ComPtr<ID3DBlob> vertexShader, pixelShader, error;UINT compileFlags = D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION; // Debug onlyD3DCompileFromFile( L"vertex_shader.hlsl", nullptr, // Macros nullptr, // Include handler "VSMain", // Entry point "vs_6_0", // Target profile (Shader Model 6.0) compileFlags, 0, &vertexShader, &error);if (error) OutputDebugStringA((char*)error->GetBufferPointer());// Modern way: use DXC compiler (required for SM 6.x features)// IDxcCompiler3 from dxcompiler.dll// Supports: WaveIntrinsics, Bindless, Raytracing, Mesh Shaders, SPIR-V output
10 — Pipeline State Object (PSO)
The Immutable Pipeline
Like Vulkan’s VkPipeline, the DX12 ID3D12PipelineState bakes shader code + all render states into one immutable blob. This means no per-draw state changes — the driver can pre-compile everything.
ComPtr<ID3D12Fence> fence;device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence));UINT64 fenceValues[numFrames] = {0};HANDLE fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);// Call at the END of each frame — signals and waits for the right framevoid WaitForFrame(UINT frameIndex) { UINT64 currentFenceValue = fenceValues[frameIndex]; commandQueue->Signal(fence.Get(), currentFenceValue); // If the GPU hasn't finished this frame yet, wait if (fence->GetCompletedValue() < currentFenceValue) { fence->SetEventOnCompletion(currentFenceValue, fenceEvent); WaitForSingleObject(fenceEvent, INFINITE); } fenceValues[frameIndex]++;}
DXR integrates ray tracing directly into the D3D12 API. It uses the existing command list / queue system but adds new state objects, acceleration structures, and shader types.
graph TD
Mesh["ID3D12Resource\n(vertex + index buffers)"]
BLAS["BottomLevelAS\nOne per unique mesh\nvkBuildAccelerationStructure BLAS"]
Instance["Instance Desc\n{transform, BLAS address, hitGroup}"]
TLAS["TopLevelAS\nAll instances in the scene\n(rebuilt each frame if objects move)"]
RGen["Ray Generation Shader (.rgen)\nOne thread per pixel. Calls TraceRay()."]
RHit["Closest Hit Shader (.rchit)\nCalled when ray hits nearest surface. Do shading."]
RMiss["Miss Shader (.rmiss)\nCalled when ray misses all geometry. Sky color."]
SBT["Shader Binding Table\nMaps geometry instance → hit group shader"]
Mesh --> BLAS --> Instance --> TLAS
TLAS --> RGen
SBT --> RHit
SBT --> RMiss
RGen -->|"TraceRay()"| RHit & RMiss
Mesh Shaders replace the entire Vertex → Tessellation → Geometry pipeline with a compute-like two-stage process. They were designed to solve GPU vertex processing inefficiencies.
Stage
Role
Analogy
Amplification Shader (AS)
Runs first. For each meshlet, decides: render or cull? If render, spawns Mesh Shader threads.
The manager who checks: “which chunks are visible?”
Mesh Shader (MS)
Processes one meshlet. Outputs vertices and primitives.
The worker who actually converts a chunk to triangles.
Meshlet: A cluster of ~64-128 triangles from a mesh. Meshlets have pre-computed normals and cones for fast culling.
// mesh_shader.hlslstruct MeshletOut { float4 position : SV_POSITION; float3 normal : NORMAL; float2 uv : TEXCOORD0;};// Meshlet data uploaded by CPUStructuredBuffer<Meshlet> g_Meshlets : register(t0);StructuredBuffer<float4> g_Positions : register(t1);StructuredBuffer<uint> g_Indices : register(t2);// Each mesh shader thread group processes ONE meshlet[NumThreads(128, 1, 1)][OutputTopology("triangle")]void MSMain( uint groupThreadID : SV_GroupThreadID, uint groupID : SV_GroupID, in payload MeshletPayload payload, out vertices MeshletOut outVerts[128], out indices uint3 outPrims[256]){ Meshlet m = g_Meshlets[groupID]; SetMeshOutputCounts(m.VertCount, m.PrimCount); if (groupThreadID < m.VertCount) { uint vi = m.VertOffset + groupThreadID; outVerts[groupThreadID].position = mul(g_MVP, g_Positions[vi]); // ... fill other attributes } if (groupThreadID < m.PrimCount) { uint pi = m.PrimOffset + groupThreadID; outPrims[groupThreadID] = uint3(g_Indices[pi*3], g_Indices[pi*3+1], g_Indices[pi*3+2]); }}
15 — Performance and Debugging
Performance Best Practices
Practice
Why it Matters
Cache PSOs to disk (ID3D12PipelineLibrary)
Loading a game with 10,000 PSOs? Cache to disk so compile only happens once.
Use Root Constants for hot data
Zero CPU overhead — 12 DWORDs written directly into the command stream.