About This Page
This page covers game performance profiling — how to measure, diagnose, and fix frame rate, memory, and load time issues. Parent: Game Testing & QA. See also: Console Development, Game Testing & QA - Automated Testing.
Profile on Target Hardware
Always profile on the lowest-spec machine in your target range. A game running at 120 FPS on your dev PC may drop to 20 FPS on an entry-level laptop or Nintendo Switch. Dev machines are always faster than player hardware.
Performance Fundamentals
Key Metrics
| Metric | Definition | Target (60 FPS) |
|---|---|---|
| Frame Time | Time to render one frame (ms) | ≤16.67ms |
| FPS | Frames per second | ≥60 |
| Frame Pacing | Consistency of frame times | Smooth (no spikes) |
| GPU Time | Time GPU spends on one frame | ≤14ms |
| CPU Time | Time CPU spends on one frame | ≤14ms |
| Memory Usage | RAM / VRAM consumed | Within platform budget |
| Load Time | Scene/level load duration | ≤5s (player tolerance) |
| Draw Calls | Number of GPU render commands | ≤1000–2000 per frame |
Frame Budget
FPS → Frame Budget:
30 FPS = 33.33ms per frame
60 FPS = 16.67ms per frame
90 FPS = 11.11ms per frame (VR minimum)
120 FPS = 8.33ms per frame
Typical 60 FPS budget breakdown:
CPU: Game logic ~2ms
CPU: Culling + sorting ~1ms
CPU: Draw call submission ~2ms
GPU: Geometry pass ~4ms
GPU: Lighting & shadows ~4ms
GPU: Post-processing ~2ms
Vsync + buffer overhead ~1ms
Headroom ~0.67ms
Rule: Any single pass > 5ms on target hardware → investigate it.
CPU-Bound vs GPU-Bound
CPU-Bound: CPU finishes after GPU (GPU is waiting)
Causes: Too many game objects, complex AI, physics, scripting
Fix: Reduce update frequency, use job system, batch AI ticks
GPU-Bound: GPU finishes after CPU (CPU is waiting)
Causes: High pixel fill rate, too many draw calls, heavy shaders
Fix: Reduce resolution, simplify shaders, batch draws, LOD
How to tell:
CPU-Bound: GPU utilisation < 90%, CPU at 100%
GPU-Bound: GPU utilisation at 100%, CPU < 90%
Most games alternate between both at different scenes.
Memory Budget
| Platform | Total RAM | Available | VRAM |
|---|---|---|---|
| PS5 | 16 GB GDDR6 | ~14 GB | Shared |
| Xbox Series X | 16 GB GDDR6 | ~13.5 GB | Shared |
| Nintendo Switch | 4 GB LPDDR4X | ~3.2 GB | Shared (no VRAM) |
| Mid-range PC | 16 GB DDR4 | ~12 GB | 6–8 GB VRAM |
| Mobile (low) | 2–3 GB | ~1.5 GB | Shared |
Profiling Tools
Built-In Engine Profilers
Unity Profiler:
Window → Analysis → Profiler
Shows: CPU timeline, GPU, Memory, Audio, Physics, Network
Deep Profile: Shows every method call (significant overhead)
Frame Debugger: Inspect every draw call of a frame
Memory Profiler (separate package): Snapshot heap, find leaks
Unreal Insights:
Run with: -trace=cpu,gpu,memory,frame
Or: stat fps / stat unit in console
Shows: CPU/GPU flame charts, memory, network replication
GPU Visualizer: Per-pass GPU timing breakdown
Unreal Frontend: Connect to running game session
Godot Profiler:
Debugger → Profiler tab
Shows: Frame time, physics, idle, per-node costs
Limited GPU info — use RenderDoc for GPU analysis in Godot
RenderDoc (All Platforms, Open Source)
What: GPU frame capture and analysis
Platforms: PC (DX11/12, Vulkan, OpenGL), Android, Switch (via vendor)
Cost: Free, open source
Workflow:
1. Launch RenderDoc → Attach to game process
2. Press F12 to capture a frame
3. Inspect: Event list, Texture viewer, Mesh viewer, Pipeline state
4. Find: Which draw call is expensive? Which texture is oversampled?
Best for:
- Visual bug investigation (wrong texture, missing mesh)
- Understanding which draw call dominates a frame
- Debugging shader outputs
PIX for Windows / Xbox
What: Microsoft GPU performance + debugging tool
Platforms: PC (DX11/12), Xbox Series X/S, Xbox One
Cost: Free (developer.microsoft.com/pix)
Features:
GPU Captures — Frame-by-frame draw call analysis
Timing Captures — CPU + GPU timeline over multiple frames
Memory Captures — GPU memory allocation over time
Counters — Hardware performance counters (cache hits etc.)
Key metrics in PIX:
GPU Duration — How long each pass takes on GPU
VS/PS Invocations — Vertex/Pixel shader call counts
Primitive Count — Triangles submitted per pass
// PIX programmatic marker (C++ / DX12)
#include <pix3.h>
// Mark a GPU event range for PIX capture
PIXBeginEvent(commandList, PIX_COLOR(255, 128, 0), "Shadow Pass");
// ... shadow rendering commands ...
PIXEndEvent(commandList);
// CPU event
PIXBeginEvent(PIX_COLOR(0, 128, 255), "AI Update");
UpdateAllEnemyAI();
PIXEndEvent();
NVIDIA Nsight Graphics
What: NVIDIA GPU profiler and debugger
Platforms: PC (DX11/12, Vulkan, OpenGL, OpenCL)
Cost: Free (developer.nvidia.com)
Features:
Frame Debugger — Full draw call inspection
GPU Trace — Timeline of GPU work
Shader Profiler — Per-instruction shader performance
Ray Tracing — BVH traversal, RT pass breakdown
DLSS Analysis — Inspect DLSS upscaling quality
Best for: PC games on NVIDIA hardware, ray tracing investigation
Xcode Instruments (iOS / macOS)
Templates useful for games:
Time Profiler — CPU call stacks, find hotspots
Metal System Trace — GPU frame analysis for Metal API
Allocations — Memory allocations, find leaks
Leaks — Detect retain cycles and memory leaks
Energy Log — Battery drain (critical for mobile!)
Metal Debugger (Xcode built-in):
Capture GPU frame → inspect all draw calls
Shader debugger: step through Metal shaders line by line
Android GPU Inspector
What: Google's GPU profiler for Android
Platforms: Android (ARM Mali, Qualcomm Adreno, PowerVR)
Cost: Free (gpuinspector.dev)
Features:
Frame profiling — GPU timeline, overdraw heatmap
System profiling — CPU + GPU + memory over time
Counter analysis — Hardware GPU counters
Alternatives:
ARM Mobile Studio — Mali GPU profiling (free)
Snapdragon Profiler — Qualcomm Adreno profiling (free)
CPU Profiling Workflow
Finding CPU Hotspots
Step 1: Run the game with profiler attached at target scene
Step 2: Capture 100–200 frames of typical gameplay
Step 3: Sort by "Self Time" (time in function only, not callees)
Step 4: Identify top 5 most expensive functions
Step 5: Investigate each:
- Is it called too many times per frame?
- Is it doing unnecessary work?
- Can it be cached, batched, or moved off the main thread?
Common CPU Performance Fixes
Expensive Per-Frame Calculations:
Problem: Computing sqrt() or trig per-entity every frame
Fix: Cache results, compute only when data changes
Too Many Active GameObjects (Unity):
Problem: 2000 enemies all updating every frame
Fix: LOD for scripts (only update visible/nearby enemies)
Disable Update() when out of range
Garbage Collection (C# / GC languages):
Problem: Frequent allocation → GC pause spikes
Fix: Object pooling, avoid LINQ in hot paths, preallocate
Main Thread Blocking:
Problem: File I/O, heavy physics, pathfinding on main thread
Fix: Move to async/thread/job system
// Unity Job System — move expensive work off main thread
using Unity.Jobs;
using Unity.Collections;
public struct EnemyMoveJob : IJobParallelFor {
public NativeArray<float3> positions;
public NativeArray<float3> velocities;
public float deltaTime;
public void Execute(int i) {
positions[i] += velocities[i] * deltaTime;
}
}
// Schedule on worker threads — main thread free
var job = new EnemyMoveJob {
positions = positionArray,
velocities = velocityArray,
deltaTime = Time.deltaTime
};
JobHandle handle = job.Schedule(enemyCount, 64);
handle.Complete(); // wait when result neededGPU Profiling Workflow
Finding GPU Bottlenecks
Step 1: Capture a GPU frame in PIX / RenderDoc / Nsight
Step 2: View the GPU timeline — which pass dominates?
Step 3: Common suspects:
- Shadow map pass (many lights = many shadow draws)
- Transparent/particle pass (overdraw intensive)
- Post-processing stack (bloom, DOF, TAA all add up)
- Sky/reflection captures (recalculate too often?)
Step 4: Look at draw call count — too many small draws?
Step 5: Check overdraw heatmap — are pixels shaded many times?
Common GPU Performance Fixes
Too Many Draw Calls:
Problem: 5000 draw calls per frame (each has CPU overhead)
Fix: GPU instancing, static batching, dynamic batching
Combine meshes at bake time (static scene)
Overdraw (Transparent Objects):
Problem: Particles shading same pixel 20 times
Fix: Reduce particle count, use depth prepass
Alpha test (discard) instead of alpha blend where possible
Heavy Shaders:
Problem: Expensive PBR shader on every object in scene
Fix: Simplified shaders for background/distant objects
Shader LOD (swap to cheaper shader at distance)
Shadow Map Cost:
Problem: 5 shadow-casting lights = 5× shadow render passes
Fix: Limit shadow-casting lights (max 1–2 dynamic)
Increase shadow cascade distances
Use baked lighting for static geometry
Texture Sampling:
Problem: Uncompressed 4K textures on every object
Fix: BC7/DXT5 (PC), ASTC (mobile/Switch) compression
Mip maps enabled on all textures
Reduce texture resolution on non-hero assets
Dynamic Resolution Scaling
Technique: Render at lower resolution when GPU is overloaded,
upscale to target resolution.
Implementations:
Dynamic Resolution (Unity/Unreal built-in)
FSR 2 (AMD FidelityFX Super Resolution) — open source
DLSS 3 (NVIDIA) — AI upscaling, RTX cards only
PSSR (PlayStation Spectral Super Resolution) — PS5 only
XeSS (Intel) — open GPU upscaling
Usage:
Set min resolution (e.g. 1440p min, target 4K)
GPU usage > 90% → drop resolution
GPU usage < 70% → raise resolution
Cap FPS first, then set resolution scaling min/max
Memory Profiling
Memory Categories
Texture Memory — Often 40–60% of total memory budget
Mesh Memory — Geometry, vertex/index buffers
Audio Memory — Compressed + decompressed audio streams
Code Memory — Compiled scripts, shader binaries
Engine Overhead — Unity/Unreal runtime, OS overhead
Network Buffers — For multiplayer games
Track all categories. Any unexpected spike → investigate.
Finding Memory Leaks
Signs of memory leak:
- Memory usage grows continuously during play
- Memory spikes on scene transitions that never drop
- Game eventually crashes after long sessions (OOM)
How to find:
1. Take memory snapshot at session start
2. Play for 10–20 minutes (load/unload several scenes)
3. Take memory snapshot again
4. Compare: what grew and never shrank?
5. Trace culprit objects: who holds references?
Tools:
Unity Memory Profiler → Heap snapshot comparison
Unreal: memreport command → allocations report
Valgrind (Linux builds) → line-level leak detection
Xcode Instruments → Leaks template (iOS)
// Unity — prevent memory leak with object pooling
public class BulletPool : MonoBehaviour {
[SerializeField] private GameObject bulletPrefab;
private Queue<GameObject> pool = new Queue<GameObject>();
public GameObject Get() {
if (pool.Count > 0) {
var b = pool.Dequeue();
b.SetActive(true);
return b;
}
return Instantiate(bulletPrefab);
}
public void Return(GameObject bullet) {
bullet.SetActive(false);
pool.Enqueue(bullet);
// Never Destroy — reuse instead
}
}Frame Pacing
What is Frame Pacing?
Frame pacing = consistency of frame delivery time.
BAD (irregular / stuttery):
Frame 1: 8ms
Frame 2: 8ms
Frame 3: 32ms ← spike!
Frame 4: 8ms
Average: 14ms = "60 FPS" on paper but FEELS jerky
GOOD (smooth):
Frame 1: 16.6ms
Frame 2: 16.6ms
Frame 3: 16.6ms
Average: 16.6ms = true 60 FPS
Cause of bad pacing: garbage collection, async operations,
streaming hiccups, shader compilation, physics spikes
Frame Pacing Fixes
Shader Compilation Stutters:
Problem: Shader compiled on first use → stutter
Fix: Precompile shaders during loading screen
Shader warmup: draw invisible objects to prime cache
Garbage Collection (Unity/C#):
Problem: GC runs, pauses main thread (even 1–2ms shows)
Fix: Object pooling, preallocate, avoid per-frame alloc
IL2CPP (AOT compiled) reduces GC in builds
Asset Streaming Hiccups:
Problem: Texture/audio streamed in, causing I/O spike
Fix: Async loading with buffer, preload assets by proximity
PS5: use PS5 I/O decompression pipeline
VSync + Triple Buffering:
Use VSync to eliminate tearing. Triple buffer for better pacing.
Consoles: always use platform VSync API.
Performance Testing Checklist
Per-Build Checklist
□ Check FPS in worst-case scene (most demanding area)
□ Check FPS in typical gameplay areas
□ Verify no frame spikes > 2× target frame time
□ Memory: within budget on lowest-spec target hardware
□ Load times: main menu < 5s, level loads < 10s
□ No shader compilation stutters in gameplay
□ GPU time < 14ms in typical scenes (60 FPS target)
□ CPU time < 14ms in typical scenes
□ No memory growth over 30-minute session
□ Soak test: run 4 hours continuously, no crash, no OOM
□ Console: test both docked + handheld modes (Switch)
□ Mobile: test on lowest-spec target device (not your phone)
Useful Links & Resources
- RenderDoc — Free open-source GPU frame capture
- PIX for Windows — Microsoft GPU profiler (Xbox/PC)
- NVIDIA Nsight — NVIDIA GPU profiler
- AMD RGP (Radeon GPU Profiler) — AMD GPU profiling
- Unity Memory Profiler — Unity memory snapshots
- Unreal Insights — UE performance analysis
- GPU Inspector (Android) — Google’s Android GPU tool
- ARM Mobile Studio — Mali GPU profiling
- GDC: Optimisation talks — Free GDC performance sessions