About This Page

This page covers game performance profiling — how to measure, diagnose, and fix frame rate, memory, and load time issues. Parent: Game Testing & QA. See also: Console Development, Game Testing & QA - Automated Testing.

Profile on Target Hardware

Always profile on the lowest-spec machine in your target range. A game running at 120 FPS on your dev PC may drop to 20 FPS on an entry-level laptop or Nintendo Switch. Dev machines are always faster than player hardware.

Performance Fundamentals

Key Metrics

MetricDefinitionTarget (60 FPS)
Frame TimeTime to render one frame (ms)≤16.67ms
FPSFrames per second≥60
Frame PacingConsistency of frame timesSmooth (no spikes)
GPU TimeTime GPU spends on one frame≤14ms
CPU TimeTime CPU spends on one frame≤14ms
Memory UsageRAM / VRAM consumedWithin platform budget
Load TimeScene/level load duration≤5s (player tolerance)
Draw CallsNumber of GPU render commands≤1000–2000 per frame

Frame Budget

FPS → Frame Budget:
  30 FPS = 33.33ms per frame
  60 FPS = 16.67ms per frame
  90 FPS = 11.11ms per frame (VR minimum)
 120 FPS =  8.33ms per frame

Typical 60 FPS budget breakdown:
  CPU: Game logic           ~2ms
  CPU: Culling + sorting    ~1ms
  CPU: Draw call submission ~2ms
  GPU: Geometry pass        ~4ms
  GPU: Lighting & shadows   ~4ms
  GPU: Post-processing      ~2ms
  Vsync + buffer overhead   ~1ms
  Headroom                  ~0.67ms

Rule: Any single pass > 5ms on target hardware → investigate it.

CPU-Bound vs GPU-Bound

CPU-Bound: CPU finishes after GPU (GPU is waiting)
  Causes: Too many game objects, complex AI, physics, scripting
  Fix: Reduce update frequency, use job system, batch AI ticks

GPU-Bound: GPU finishes after CPU (CPU is waiting)
  Causes: High pixel fill rate, too many draw calls, heavy shaders
  Fix: Reduce resolution, simplify shaders, batch draws, LOD

How to tell:
  CPU-Bound: GPU utilisation < 90%, CPU at 100%
  GPU-Bound: GPU utilisation at 100%, CPU < 90%

Most games alternate between both at different scenes.

Memory Budget

PlatformTotal RAMAvailableVRAM
PS516 GB GDDR6~14 GBShared
Xbox Series X16 GB GDDR6~13.5 GBShared
Nintendo Switch4 GB LPDDR4X~3.2 GBShared (no VRAM)
Mid-range PC16 GB DDR4~12 GB6–8 GB VRAM
Mobile (low)2–3 GB~1.5 GBShared

Profiling Tools

Built-In Engine Profilers

Unity Profiler:
  Window → Analysis → Profiler
  Shows: CPU timeline, GPU, Memory, Audio, Physics, Network
  Deep Profile: Shows every method call (significant overhead)
  Frame Debugger: Inspect every draw call of a frame
  Memory Profiler (separate package): Snapshot heap, find leaks

Unreal Insights:
  Run with: -trace=cpu,gpu,memory,frame
  Or: stat fps / stat unit in console
  Shows: CPU/GPU flame charts, memory, network replication
  GPU Visualizer: Per-pass GPU timing breakdown
  Unreal Frontend: Connect to running game session

Godot Profiler:
  Debugger → Profiler tab
  Shows: Frame time, physics, idle, per-node costs
  Limited GPU info — use RenderDoc for GPU analysis in Godot

RenderDoc (All Platforms, Open Source)

What: GPU frame capture and analysis
Platforms: PC (DX11/12, Vulkan, OpenGL), Android, Switch (via vendor)
Cost: Free, open source

Workflow:
  1. Launch RenderDoc → Attach to game process
  2. Press F12 to capture a frame
  3. Inspect: Event list, Texture viewer, Mesh viewer, Pipeline state
  4. Find: Which draw call is expensive? Which texture is oversampled?

Best for:
  - Visual bug investigation (wrong texture, missing mesh)
  - Understanding which draw call dominates a frame
  - Debugging shader outputs

PIX for Windows / Xbox

What: Microsoft GPU performance + debugging tool
Platforms: PC (DX11/12), Xbox Series X/S, Xbox One
Cost: Free (developer.microsoft.com/pix)

Features:
  GPU Captures    — Frame-by-frame draw call analysis
  Timing Captures — CPU + GPU timeline over multiple frames
  Memory Captures — GPU memory allocation over time
  Counters        — Hardware performance counters (cache hits etc.)

Key metrics in PIX:
  GPU Duration    — How long each pass takes on GPU
  VS/PS Invocations — Vertex/Pixel shader call counts
  Primitive Count — Triangles submitted per pass
// PIX programmatic marker (C++ / DX12)
#include <pix3.h>

// Mark a GPU event range for PIX capture
PIXBeginEvent(commandList, PIX_COLOR(255, 128, 0), "Shadow Pass");
// ... shadow rendering commands ...
PIXEndEvent(commandList);

// CPU event
PIXBeginEvent(PIX_COLOR(0, 128, 255), "AI Update");
UpdateAllEnemyAI();
PIXEndEvent();

NVIDIA Nsight Graphics

What: NVIDIA GPU profiler and debugger
Platforms: PC (DX11/12, Vulkan, OpenGL, OpenCL)
Cost: Free (developer.nvidia.com)

Features:
  Frame Debugger   — Full draw call inspection
  GPU Trace        — Timeline of GPU work
  Shader Profiler  — Per-instruction shader performance
  Ray Tracing      — BVH traversal, RT pass breakdown
  DLSS Analysis    — Inspect DLSS upscaling quality

Best for: PC games on NVIDIA hardware, ray tracing investigation

Xcode Instruments (iOS / macOS)

Templates useful for games:
  Time Profiler  — CPU call stacks, find hotspots
  Metal System Trace — GPU frame analysis for Metal API
  Allocations    — Memory allocations, find leaks
  Leaks          — Detect retain cycles and memory leaks
  Energy Log     — Battery drain (critical for mobile!)

Metal Debugger (Xcode built-in):
  Capture GPU frame → inspect all draw calls
  Shader debugger: step through Metal shaders line by line

Android GPU Inspector

What: Google's GPU profiler for Android
Platforms: Android (ARM Mali, Qualcomm Adreno, PowerVR)
Cost: Free (gpuinspector.dev)

Features:
  Frame profiling   — GPU timeline, overdraw heatmap
  System profiling  — CPU + GPU + memory over time
  Counter analysis  — Hardware GPU counters

Alternatives:
  ARM Mobile Studio  — Mali GPU profiling (free)
  Snapdragon Profiler — Qualcomm Adreno profiling (free)

CPU Profiling Workflow

Finding CPU Hotspots

Step 1: Run the game with profiler attached at target scene
Step 2: Capture 100–200 frames of typical gameplay
Step 3: Sort by "Self Time" (time in function only, not callees)
Step 4: Identify top 5 most expensive functions
Step 5: Investigate each:
  - Is it called too many times per frame?
  - Is it doing unnecessary work?
  - Can it be cached, batched, or moved off the main thread?

Common CPU Performance Fixes

Expensive Per-Frame Calculations:
  Problem: Computing sqrt() or trig per-entity every frame
  Fix:     Cache results, compute only when data changes

Too Many Active GameObjects (Unity):
  Problem: 2000 enemies all updating every frame
  Fix:     LOD for scripts (only update visible/nearby enemies)
           Disable Update() when out of range

Garbage Collection (C# / GC languages):
  Problem: Frequent allocation → GC pause spikes
  Fix:     Object pooling, avoid LINQ in hot paths, preallocate

Main Thread Blocking:
  Problem: File I/O, heavy physics, pathfinding on main thread
  Fix:     Move to async/thread/job system
// Unity Job System — move expensive work off main thread
using Unity.Jobs;
using Unity.Collections;
 
public struct EnemyMoveJob : IJobParallelFor {
    public NativeArray<float3> positions;
    public NativeArray<float3> velocities;
    public float deltaTime;
 
    public void Execute(int i) {
        positions[i] += velocities[i] * deltaTime;
    }
}
 
// Schedule on worker threads — main thread free
var job = new EnemyMoveJob {
    positions  = positionArray,
    velocities = velocityArray,
    deltaTime  = Time.deltaTime
};
JobHandle handle = job.Schedule(enemyCount, 64);
handle.Complete(); // wait when result needed

GPU Profiling Workflow

Finding GPU Bottlenecks

Step 1: Capture a GPU frame in PIX / RenderDoc / Nsight
Step 2: View the GPU timeline — which pass dominates?
Step 3: Common suspects:
  - Shadow map pass (many lights = many shadow draws)
  - Transparent/particle pass (overdraw intensive)
  - Post-processing stack (bloom, DOF, TAA all add up)
  - Sky/reflection captures (recalculate too often?)
Step 4: Look at draw call count — too many small draws?
Step 5: Check overdraw heatmap — are pixels shaded many times?

Common GPU Performance Fixes

Too Many Draw Calls:
  Problem: 5000 draw calls per frame (each has CPU overhead)
  Fix:     GPU instancing, static batching, dynamic batching
           Combine meshes at bake time (static scene)

Overdraw (Transparent Objects):
  Problem: Particles shading same pixel 20 times
  Fix:     Reduce particle count, use depth prepass
           Alpha test (discard) instead of alpha blend where possible

Heavy Shaders:
  Problem: Expensive PBR shader on every object in scene
  Fix:     Simplified shaders for background/distant objects
           Shader LOD (swap to cheaper shader at distance)

Shadow Map Cost:
  Problem: 5 shadow-casting lights = 5× shadow render passes
  Fix:     Limit shadow-casting lights (max 1–2 dynamic)
           Increase shadow cascade distances
           Use baked lighting for static geometry

Texture Sampling:
  Problem: Uncompressed 4K textures on every object
  Fix:     BC7/DXT5 (PC), ASTC (mobile/Switch) compression
           Mip maps enabled on all textures
           Reduce texture resolution on non-hero assets

Dynamic Resolution Scaling

Technique: Render at lower resolution when GPU is overloaded,
           upscale to target resolution.

Implementations:
  Dynamic Resolution (Unity/Unreal built-in)
  FSR 2 (AMD FidelityFX Super Resolution) — open source
  DLSS 3 (NVIDIA) — AI upscaling, RTX cards only
  PSSR (PlayStation Spectral Super Resolution) — PS5 only
  XeSS (Intel) — open GPU upscaling

Usage:
  Set min resolution (e.g. 1440p min, target 4K)
  GPU usage > 90% → drop resolution
  GPU usage < 70% → raise resolution
  Cap FPS first, then set resolution scaling min/max

Memory Profiling

Memory Categories

Texture Memory     — Often 40–60% of total memory budget
Mesh Memory        — Geometry, vertex/index buffers
Audio Memory       — Compressed + decompressed audio streams
Code Memory        — Compiled scripts, shader binaries
Engine Overhead    — Unity/Unreal runtime, OS overhead
Network Buffers    — For multiplayer games

Track all categories. Any unexpected spike → investigate.

Finding Memory Leaks

Signs of memory leak:
  - Memory usage grows continuously during play
  - Memory spikes on scene transitions that never drop
  - Game eventually crashes after long sessions (OOM)

How to find:
  1. Take memory snapshot at session start
  2. Play for 10–20 minutes (load/unload several scenes)
  3. Take memory snapshot again
  4. Compare: what grew and never shrank?
  5. Trace culprit objects: who holds references?

Tools:
  Unity Memory Profiler → Heap snapshot comparison
  Unreal: memreport command → allocations report
  Valgrind (Linux builds) → line-level leak detection
  Xcode Instruments → Leaks template (iOS)
// Unity — prevent memory leak with object pooling
public class BulletPool : MonoBehaviour {
    [SerializeField] private GameObject bulletPrefab;
    private Queue<GameObject> pool = new Queue<GameObject>();
 
    public GameObject Get() {
        if (pool.Count > 0) {
            var b = pool.Dequeue();
            b.SetActive(true);
            return b;
        }
        return Instantiate(bulletPrefab);
    }
 
    public void Return(GameObject bullet) {
        bullet.SetActive(false);
        pool.Enqueue(bullet);
        // Never Destroy — reuse instead
    }
}

Frame Pacing

What is Frame Pacing?

Frame pacing = consistency of frame delivery time.

BAD (irregular / stuttery):
  Frame 1:  8ms
  Frame 2:  8ms
  Frame 3: 32ms  ← spike!
  Frame 4:  8ms
  Average: 14ms = "60 FPS" on paper but FEELS jerky

GOOD (smooth):
  Frame 1: 16.6ms
  Frame 2: 16.6ms
  Frame 3: 16.6ms
  Average: 16.6ms = true 60 FPS

Cause of bad pacing: garbage collection, async operations,
streaming hiccups, shader compilation, physics spikes

Frame Pacing Fixes

Shader Compilation Stutters:
  Problem: Shader compiled on first use → stutter
  Fix:     Precompile shaders during loading screen
           Shader warmup: draw invisible objects to prime cache

Garbage Collection (Unity/C#):
  Problem: GC runs, pauses main thread (even 1–2ms shows)
  Fix:     Object pooling, preallocate, avoid per-frame alloc
           IL2CPP (AOT compiled) reduces GC in builds

Asset Streaming Hiccups:
  Problem: Texture/audio streamed in, causing I/O spike
  Fix:     Async loading with buffer, preload assets by proximity
           PS5: use PS5 I/O decompression pipeline

VSync + Triple Buffering:
  Use VSync to eliminate tearing. Triple buffer for better pacing.
  Consoles: always use platform VSync API.

Performance Testing Checklist

Per-Build Checklist

□ Check FPS in worst-case scene (most demanding area)
□ Check FPS in typical gameplay areas
□ Verify no frame spikes > 2× target frame time
□ Memory: within budget on lowest-spec target hardware
□ Load times: main menu < 5s, level loads < 10s
□ No shader compilation stutters in gameplay
□ GPU time < 14ms in typical scenes (60 FPS target)
□ CPU time < 14ms in typical scenes
□ No memory growth over 30-minute session
□ Soak test: run 4 hours continuously, no crash, no OOM
□ Console: test both docked + handheld modes (Switch)
□ Mobile: test on lowest-spec target device (not your phone)

Useful Links & Resources