seoTitle: Apple Metal API Complete Guide – Zero to Ray Tracing
description: “The definitive Metal API reference. Covers MTLDevice, MSL shaders, Argument Buffers, Apple Silicon memory, Metal Performance Shaders, and Metal Ray Tracing from beginner to advanced.”
keywords: “apple metal, metal api, msl, metal shading language, argument buffers, metal ray tracing, ios game, macos graphics, swift metal, objective-c metal, apple silicon, vr-rathod”
displayTitle: Metal
Apple Metal — The Roadmap
Metal vs Vulkan vs D3D12
Concept
Vulkan
DirectX 12
Metal
GPU handle
VkDevice
ID3D12Device
MTLDevice
Submission queue
VkQueue
ID3D12CommandQueue
MTLCommandQueue
Commands recorded here
VkCommandBuffer
ID3D12GraphicsCommandList
MTLCommandBuffer + Encoder
Shader bindings schema
VkDescriptorSetLayout
Root Signature
MTLArgumentEncoder / implicit
Baked pipeline
VkPipeline
ID3D12PipelineState
MTLRenderPipelineState
CPU-GPU sync
VkFence
ID3D12Fence
MTLFence / MTLEvent
Memory
Explicit vkAllocateMemory
Explicit Heap Types
Mostly automatic (Storage Modes)
Shader language
GLSL → SPIR-V
HLSL → DXIL
MSL (C++14 based)
Cross-platform
Yes
Windows/Xbox only
Apple only
Difficulty
Very High
Very High
High (but friendlier)
Apple Silicon Unified Memory Architecture
This is the biggest difference between Metal and Vulkan/DX12. On Apple Silicon (M1, M2, M3, M4):
graph TD
DGB["Discrete GPU (NVIDIA/AMD)\nCPU RAM and GPU VRAM are SEPARATE\nYou must copy data across PCIe bus"]
UMA["Apple Silicon Unified Memory\nCPU and GPU SHARE THE SAME physical memory\nNo copy needed — just change permissions"]
DGB -.->|"memcpy via PCIe (slow)"| DGB
UMA -->|"Zero copy! Same RAM"| UMA
Unified Memory means textures and buffers created with Shared mode are instantly accessible by both CPU and GPU — no staging buffers needed on Apple Silicon for most use cases!
MTLDevice (The GPU)
Getting the Device
// Swift: Get the default system GPUimport Metalimport MetalKitguard let device = MTLCreateSystemDefaultDevice() else { fatalError("Metal not supported on this device!")}// Query GPU capabilitiesprint("GPU: \(device.name)")print("Is Apple Silicon: \(device.hasUnifiedMemory)")print("Max threads per threadgroup: \(device.maxThreadsPerThreadgroup)")print("Supports ray tracing: \(device.supportsRaytracing)")print("Max shader buffer size: \(device.maxBufferLength / (1024*1024)) MB")print("Supports mesh shaders: \(device.supportsMeshShaders)")
// C++: Using Metal-cpp (Apple's official C++ wrapper)#include <Metal/Metal.hpp>MTL::Device* device = MTL::CreateSystemDefaultDevice();
Enumerating All GPUs (macOS)
// On macOS, a machine may have multiple GPUs (integrated + discrete)let allDevices = MTLCopyAllDevices()for device in allDevices { print("Device: \(device.name)") print(" - Is Low Power (Integrated): \(device.isLowPower)") print(" - Is Headless: \(device.isHeadless)") print(" - Is Removable: \(device.isRemovable)") print(" - Recommended Max Working Set: \(device.recommendedMaxWorkingSetSize / (1024*1024)) MB")}
MTLCommandQueue and the Command System
The Metal Command Flow
graph TD
Q["MTLCommandQueue\nSubmission queue for the GPU\nCreate once, keep alive forever"]
CB["MTLCommandBuffer\nCreated from queue each frame\nHolds a sequence of encoded commands"]
subgraph Encoders["Encoders (pick one per scope)"]
RE["MTLRenderCommandEncoder\nEncode draw calls, set pipeline, bind buffers"]
CE["MTLComputeCommandEncoder\nEncode compute dispatches"]
BE["MTLBlitCommandEncoder\nEncode memory copies"]
AE["MTLAccelerationStructureCommandEncoder\nBuild BVH for ray tracing"]
end
Q -->|"makeCommandBuffer()"| CB
CB --> Encoders
CB -->|"commit()"| GPU["🎮 GPU executes"]
Creating and Using the Command Queue
// Create the queue — one queue per "stream of work"guard let commandQueue = device.makeCommandQueue() else { fatalError("Could not create command queue")}// Optional: label for debugging in Xcode GPU DebuggercommandQueue.label = "Main Render Queue"// ---- Per Frame ----guard let commandBuffer = commandQueue.makeCommandBuffer() else { return }commandBuffer.label = "Frame \(frameNumber) Command Buffer"// Add completion handler (callback when GPU finishes)commandBuffer.addCompletedHandler { [weak self] buffer in if buffer.status == .error { print("GPU Error: \(String(describing: buffer.error))") } self?.frameFinished() // Unblock CPU for next frame}
Metal Shading Language (MSL)
MSL is C++14
Unlike GLSL or HLSL, MSL is properly based on C++14 extended with GPU-specific keywords. This means:
✔ Real C++ structs, templates, functions, namespaces
✔ Pointers in shaders (unique to Metal!)
✔ References (const T& params)
✔ Single header shared between CPU and GPU code
✔ No need to re-declare structs — one definition for both
Shared CPU/GPU Header Pattern
// SharedTypes.h — included in both C++ and .metal files!struct VertexData { simd_float3 position; simd_float3 normal; simd_float2 texCoord;};struct PerFrameUniforms { simd_float4x4 modelMatrix; simd_float4x4 viewProjectionMatrix; simd_float3 cameraPosition; float time;};
Vertex Shader (MSL)
// Shaders.metal#include <metal_stdlib>#include "SharedTypes.h" // Shared with CPU!using namespace metal;// Vertex shader output → fragment shader inputstruct RasterizerData { float4 position [[position]]; // [[position]] = built-in clip-space position float3 worldPos; float3 normal; float2 texCoord;};// Metal attributes explain how data is bound:// [[stage_in]] = vertex data from vertex buffer (assembled by Metal)// [[buffer(N)]] = bound via setVertexBuffer(_, offset:, index:)// [[texture(N)]] = bound via setVertexTexture(_, index:)vertex RasterizerData vertexShader( const device VertexData* vertices [[buffer(0)]], // Vertex buffer constant PerFrameUniforms& uniforms [[buffer(1)]], // Uniform buffer uint vertexID [[vertex_id]]) // Built-in: which vertex{ VertexData v = vertices[vertexID]; float4 worldPos = uniforms.modelMatrix * float4(v.position, 1.0); RasterizerData out; out.position = uniforms.viewProjectionMatrix * worldPos; out.worldPos = worldPos.xyz; out.normal = (uniforms.modelMatrix * float4(v.normal, 0.0)).xyz; out.texCoord = v.texCoord; return out;}
Normal setFragmentTexture(_, index:) calls have overhead. For a scene with 10,000 objects using 5,000 unique textures, calling setFragmentTexture for each object is a CPU bottleneck.
Argument Buffers solve this: package many resources (textures, samplers, buffers) into a single buffer. The shader picks which resource it needs using an index.
Creating an Argument Buffer
// An Argument Buffer is described by a shader struct with resource bindings// First, create an argument encoder from a shader function's argument// In the .metal shader:// struct MaterialData {// texture2d<float> albedo [[id(0)]];// texture2d<float> normalMap [[id(1)]];// float4 baseColor [[id(2)]];// };// fragment float4 fragShader(..., device MaterialData* materials [[buffer(3)]], uint materialID...)// The argument encoder knows the size and layout of MaterialDatalet materialEncoder = fragmentFunction.makeArgumentEncoder(bufferIndex: 3)// Allocate an argument buffer to hold N materialslet materialCount = 500let abLength = materialEncoder.encodedLength * materialCountlet argumentBuffer = device.makeBuffer(length: abLength, options: .storageModeShared)!// Fill each material slotfor i in 0..<materialCount { materialEncoder.setArgumentBuffer(argumentBuffer, startOffset: 0, arrayElement: i) materialEncoder.setTexture(allTextures[i * 2], index: 0) // albedo materialEncoder.setTexture(allTextures[i * 2 + 1], index: 1) // normal}// At render time — ONE bind call instead of thousands!renderEncoder.setFragmentBuffer(argumentBuffer, offset: 0, index: 3)// CRITICAL: Declare resource usage so Metal knows which textures are accessedrenderEncoder.useResources(allTextures, usage: .read, stages: .fragment)
Metal Performance Shaders is Apple’s library of highly-optimized GPU algorithms. Instead of writing your own, you call pre-built, Apple-tuned kernels for common operations.
Metal Ray Tracing (Metal 2.3+, available on all Apple Silicon and AMD Navi GPUs) is uniquely flexible: you can use it from compute shaders without needing a dedicated “ray tracing pipeline”. You just call intersect() from any kernel.