Concept: Async Compute
What Is Async Compute?
- Running compute work in parallel with graphics work on the GPU
- Modern GPUs have separate compute queues that can run alongside the graphics queue
- Enables better GPU utilization by filling idle shader units
GPU Queue Types
- Graphics queue: supports all operations (graphics, compute, transfer)
- Compute queue: compute + transfer only (no rasterization)
- Transfer queue: DMA transfers only
- Multiple queues can run simultaneously on different hardware units
Why It Matters for Path Tracing
- BLAS builds are compute-heavy — can overlap with rendering
- Denoising passes can overlap with next frame’s ray tracing
- TLAS rebuild can overlap with shadow ray tracing
- Typical frame timeline without async
- With async compute
Vulkan Async Compute Setup
- Find a compute-only queue family
for (auto& queueFamily : queueFamilies) { if ((queueFamily.queueFlags & VK_QUEUE_COMPUTE_BIT) && !(queueFamily.queueFlags & VK_QUEUE_GRAPHICS_BIT)) { computeQueueFamily = index; } } - Create separate command pools and queues for compute
- Submit compute work to compute queue, graphics to graphics queue
Synchronization
- Async compute requires careful synchronization
- Timeline semaphores (Vulkan 1.2) — preferred
VkSemaphoreTypeCreateInfo typeInfo{}; typeInfo.semaphoreType = VK_SEMAPHORE_TYPE_TIMELINE; typeInfo.initialValue = 0;- Signal from compute queue, wait on graphics queue
- Pipeline barriers within a queue
- Queue ownership transfers for shared resources
Practical Considerations
- Not all GPUs benefit equally
- Integrated GPUs: often single queue, no benefit
- Discrete GPUs: multiple compute units, significant benefit
- Overhead: synchronization adds complexity and some latency
- Profile first: measure actual GPU utilization before optimizing
- NVIDIA NSight, AMD RGP — tools for visualizing queue utilization
In Godot Context
- Godot’s
RenderingDeviceexposes compute queues - BLAS builds for skinned meshes are good candidates for async
- Denoising (OIDN compute) can run async with next frame’s RT