Concept: BVH Traversal


Basic Traversal Algorithm

  • Recursive version (conceptual)
    float traverse(Ray ray, BVHNode node):
        if not intersect(ray, node.aabb): return infinity
        if node.is_leaf:
            return intersect_primitives(ray, node.primitives)
        t_left  = traverse(ray, node.left)
        t_right = traverse(ray, node.right)
        return min(t_left, t_right)
    
  • Iterative version (GPU-friendly — no recursion)
    • Use an explicit stack (array of node indices)
    float traverseBVH(Ray ray) {
        int stack[64];
        int stackPtr = 0;
        stack[stackPtr++] = 0;  // root node
        float tMin = infinity;
        
        while (stackPtr > 0) {
            int nodeIdx = stack[--stackPtr];
            BVHNode node = bvhNodes[nodeIdx];
            
            float t;
            if (!intersectAABB(ray, node.aabb_min, node.aabb_max, t)) continue;
            if (t > tMin) continue;  // already found closer hit
            
            if (node.prim_count > 0) {  // leaf
                for (int i = 0; i < node.prim_count; i++) {
                    float tHit = intersectTriangle(ray, node.prim_offset + i);
                    tMin = min(tMin, tHit);
                }
            } else {  // internal
                stack[stackPtr++] = node.left_child;
                stack[stackPtr++] = node.left_child + 1;  // right child
            }
        }
        return tMin;
    }

Ordered Traversal (Optimization)

  • Visit the closer child first
  • If closer child has a hit, the farther child may be skipped entirely
  • How to determine closer child
    • Compare AABB intersection distances: t_left vs t_right
    • Push farther child first (so closer child is on top of stack)
  • Significant speedup for primary rays (coherent rays)
  • Less benefit for secondary rays (incoherent)

Shadow Ray Optimization

  • Shadow rays only need to know IF there’s a hit, not WHICH hit
  • Use gl_RayFlagsTerminateOnFirstHitEXT in Vulkan
  • In software BVH: return immediately on first primitive hit
  • Avoids traversing the entire BVH for occlusion queries

Stack Overflow

  • BVH depth can exceed stack size for degenerate trees
  • Typical max depth: 30-40 for well-built BVH
  • Stack size of 64 is usually sufficient
  • Degenerate case: all primitives in a line → depth = N (use SAH to avoid)

GPU Traversal Considerations

  • Warp divergence
    • Different threads in a warp may take different BVH paths
    • Reduces GPU utilization (some threads idle while others traverse)
    • Hardware RT (Vulkan) handles this internally — major advantage
  • Memory access patterns
    • BVH nodes should be in a contiguous buffer (good cache behavior)
    • Leaf primitives should be sorted to match BVH order
  • Persistent threads
    • Keep threads alive across multiple rays (avoid kernel launch overhead)
    • Used in production GPU path tracers