seoTitle: Apple Metal API Complete Guide – Zero to Ray Tracing
description: “The definitive Metal API reference. Covers MTLDevice, MSL shaders, Argument Buffers, Apple Silicon memory, Metal Performance Shaders, and Metal Ray Tracing from beginner to advanced.”
keywords: “apple metal, metal api, msl, metal shading language, argument buffers, metal ray tracing, ios game, macos graphics, swift metal, objective-c metal, apple silicon, vr-rathod”
displayTitle: Metal
Apple Metal — The Roadmap
What is Metal?
Metal is Apple’s explicit, low-overhead graphics and compute API for all Apple platforms: macOS, iOS, iPadOS, tvOS, and visionOS. It was introduced in 2014 — before Vulkan or DX12 — making it the first truly “modern” explicit GPU API.
Metal is REQUIRED for any serious graphics application on Apple platforms. OpenGL is deprecated on macOS (since 10.14 Mojave) and removed from iOS.
This is the biggest difference between Metal and Vulkan/DX12. On Apple Silicon (M1, M2, M3, M4):
graph TD
DGB["Discrete GPU (NVIDIA/AMD)\nCPU RAM and GPU VRAM are SEPARATE\nYou must copy data across PCIe bus"]
UMA["Apple Silicon Unified Memory\nCPU and GPU SHARE THE SAME physical memory\nNo copy needed — just change permissions"]
DGB -.->|"memcpy via PCIe (slow)"| DGB
UMA -->|"Zero copy! Same RAM"| UMA
Unified Memory means textures and buffers created with Shared mode are instantly accessible by both CPU and GPU — no staging buffers needed on Apple Silicon for most use cases!
MTLDevice (The GPU)
Getting the Device
// Swift: Get the default system GPUimport Metalimport MetalKitguard let device = MTLCreateSystemDefaultDevice() else { fatalError("Metal not supported on this device!")}// Query GPU capabilitiesprint("GPU: \(device.name)")print("Is Apple Silicon: \(device.hasUnifiedMemory)")print("Max threads per threadgroup: \(device.maxThreadsPerThreadgroup)")print("Supports ray tracing: \(device.supportsRaytracing)")print("Max shader buffer size: \(device.maxBufferLength / (1024*1024)) MB")print("Supports mesh shaders: \(device.supportsMeshShaders)")
// C++: Using Metal-cpp (Apple's official C++ wrapper)#include <Metal/Metal.hpp>MTL::Device* device = MTL::CreateSystemDefaultDevice();
Enumerating All GPUs (macOS)
// On macOS, a machine may have multiple GPUs (integrated + discrete)let allDevices = MTLCopyAllDevices()for device in allDevices { print("Device: \(device.name)") print(" - Is Low Power (Integrated): \(device.isLowPower)") print(" - Is Headless: \(device.isHeadless)") print(" - Is Removable: \(device.isRemovable)") print(" - Recommended Max Working Set: \(device.recommendedMaxWorkingSetSize / (1024*1024)) MB")}
MTLCommandQueue and the Command System
The Metal Command Flow
graph TD
Q["MTLCommandQueue\nSubmission queue for the GPU\nCreate once, keep alive forever"]
CB["MTLCommandBuffer\nCreated from queue each frame\nHolds a sequence of encoded commands"]
subgraph Encoders["Encoders (pick one per scope)"]
RE["MTLRenderCommandEncoder\nEncode draw calls, set pipeline, bind buffers"]
CE["MTLComputeCommandEncoder\nEncode compute dispatches"]
BE["MTLBlitCommandEncoder\nEncode memory copies"]
AE["MTLAccelerationStructureCommandEncoder\nBuild BVH for ray tracing"]
end
Q -->|"makeCommandBuffer()"| CB
CB --> Encoders
CB -->|"commit()"| GPU["🎮 GPU executes"]
Creating and Using the Command Queue
// Create the queue — one queue per "stream of work"guard let commandQueue = device.makeCommandQueue() else { fatalError("Could not create command queue")}// Optional: label for debugging in Xcode GPU DebuggercommandQueue.label = "Main Render Queue"// ---- Per Frame ----guard let commandBuffer = commandQueue.makeCommandBuffer() else { return }commandBuffer.label = "Frame \(frameNumber) Command Buffer"// Add completion handler (callback when GPU finishes)commandBuffer.addCompletedHandler { [weak self] buffer in if buffer.status == .error { print("GPU Error: \(String(describing: buffer.error))") } self?.frameFinished() // Unblock CPU for next frame}
Metal Shading Language (MSL)
MSL is C++14
Unlike GLSL or HLSL, MSL is properly based on C++14 extended with GPU-specific keywords. This means:
✔ Real C++ structs, templates, functions, namespaces
✔ Pointers in shaders (unique to Metal!)
✔ References (const T& params)
✔ Single header shared between CPU and GPU code
✔ No need to re-declare structs — one definition for both
Shared CPU/GPU Header Pattern
// SharedTypes.h — included in both C++ and .metal files!struct VertexData { simd_float3 position; simd_float3 normal; simd_float2 texCoord;};struct PerFrameUniforms { simd_float4x4 modelMatrix; simd_float4x4 viewProjectionMatrix; simd_float3 cameraPosition; float time;};
Vertex Shader (MSL)
// Shaders.metal#include <metal_stdlib>#include "SharedTypes.h" // Shared with CPU!using namespace metal;// Vertex shader output → fragment shader inputstruct RasterizerData { float4 position [[position]]; // [[position]] = built-in clip-space position float3 worldPos; float3 normal; float2 texCoord;};// Metal attributes explain how data is bound:// [[stage_in]] = vertex data from vertex buffer (assembled by Metal)// [[buffer(N)]] = bound via setVertexBuffer(_, offset:, index:)// [[texture(N)]] = bound via setVertexTexture(_, index:)vertex RasterizerData vertexShader( const device VertexData* vertices [[buffer(0)]], // Vertex buffer constant PerFrameUniforms& uniforms [[buffer(1)]], // Uniform buffer uint vertexID [[vertex_id]]) // Built-in: which vertex{ VertexData v = vertices[vertexID]; float4 worldPos = uniforms.modelMatrix * float4(v.position, 1.0); RasterizerData out; out.position = uniforms.viewProjectionMatrix * worldPos; out.worldPos = worldPos.xyz; out.normal = (uniforms.modelMatrix * float4(v.normal, 0.0)).xyz; out.texCoord = v.texCoord; return out;}
Normal setFragmentTexture(_, index:) calls have overhead. For a scene with 10,000 objects using 5,000 unique textures, calling setFragmentTexture for each object is a CPU bottleneck.
Argument Buffers solve this: package many resources (textures, samplers, buffers) into a single buffer. The shader picks which resource it needs using an index.
Creating an Argument Buffer
// An Argument Buffer is described by a shader struct with resource bindings// First, create an argument encoder from a shader function's argument// In the .metal shader:// struct MaterialData {// texture2d<float> albedo [[id(0)]];// texture2d<float> normalMap [[id(1)]];// float4 baseColor [[id(2)]];// };// fragment float4 fragShader(..., device MaterialData* materials [[buffer(3)]], uint materialID...)// The argument encoder knows the size and layout of MaterialDatalet materialEncoder = fragmentFunction.makeArgumentEncoder(bufferIndex: 3)// Allocate an argument buffer to hold N materialslet materialCount = 500let abLength = materialEncoder.encodedLength * materialCountlet argumentBuffer = device.makeBuffer(length: abLength, options: .storageModeShared)!// Fill each material slotfor i in 0..<materialCount { materialEncoder.setArgumentBuffer(argumentBuffer, startOffset: 0, arrayElement: i) materialEncoder.setTexture(allTextures[i * 2], index: 0) // albedo materialEncoder.setTexture(allTextures[i * 2 + 1], index: 1) // normal}// At render time — ONE bind call instead of thousands!renderEncoder.setFragmentBuffer(argumentBuffer, offset: 0, index: 3)// CRITICAL: Declare resource usage so Metal knows which textures are accessedrenderEncoder.useResources(allTextures, usage: .read, stages: .fragment)
Metal Performance Shaders is Apple’s library of highly-optimized GPU algorithms. Instead of writing your own, you call pre-built, Apple-tuned kernels for common operations.
Metal Ray Tracing (Metal 2.3+, available on all Apple Silicon and AMD Navi GPUs) is uniquely flexible: you can use it from compute shaders without needing a dedicated “ray tracing pipeline”. You just call intersect() from any kernel.