WebGPU — The Complete Masterclass

WebGPU vs WebGL

FeatureWebGL 2 (OpenGL ES)WebGPU (Vulkan/DX12/Metal)
Mental modelGlobal state machineExplicit objects, no global state
ThreadingSingle-threadedWorkers + OffscreenCanvas
Compute shadersNoneFull compute pipeline
Explicit memoryNoYes (mapped buffers)
Shader languageGLSL ESWGSL (strongly typed)
Performance ceilingMediumVery High
DebuggingLimitedError messages + validation + labels
Multi-draw indirectNoYes
Timestamp queriesNoYes
StatusLegacy (works, no new features)Active, rapidly expanding spec

Initialization (Adapter → Device)

The Initialization Chain

graph TD
    Nav["navigator.gpu\nEntry point — the Vulkan Instance equivalent"]
    Adapter["GPUAdapter\nRepresents a physical GPU (or software rasterizer)\nRequested with powerPreference"]
    Device["GPUDevice\nYour logical connection to the GPU\nAll objects are created from here"]
    Queue["GPUQueue\nSubmit command buffers here\nAlways exists as device.queue"]

    Nav -->|"requestAdapter()"| Adapter
    Adapter -->|"requestDevice()"| Device
    Device -->|"device.queue"| Queue

Initialization Code

async function initWebGPU() {
    // 1. Check browser support
    if (!navigator.gpu) {
        throw new Error("WebGPU is not supported in this browser.");
    }
 
    // 2. Request an Adapter (Physical GPU)
    const adapter = await navigator.gpu.requestAdapter({
        powerPreference: "high-performance", // "low-power" for integrated GPU
    });
 
    if (!adapter) throw new Error("No suitable GPU adapter found.");
 
    // 3. Inspect adapter capabilities
    const adapterInfo = await adapter.requestAdapterInfo();
    console.log("GPU Vendor:", adapterInfo.vendor);
    console.log("GPU Architecture:", adapterInfo.architecture);
    console.log("Max Texture Dimension:", adapter.limits.maxTextureDimension2D);
    console.log("Max Buffer Size:", adapter.limits.maxBufferSize / (1024*1024), "MB");
 
    // 4. Request a Device (Logical GPU Connection)
    const device = await adapter.requestDevice({
        label: "My WebGPU Device",
        requiredLimits: {
            maxBufferSize:           512 * 1024 * 1024, // Request up to 512 MB buffers
            maxStorageBufferBindingSize: 512 * 1024 * 1024,
        },
        requiredFeatures: [
            // "texture-compression-bc",   // BC compressed textures (Desktop)
            // "texture-compression-astc",  // ASTC compressed textures (Mobile)
            // "rg11b10ufloat-renderable",  // HDR render targets
            // "timestamp-query",           // GPU timing
        ]
    });
 
    // 5. Handle device loss (GPU reset, driver update, etc.)
    device.lost.then((info) => {
        console.error("WebGPU device was lost:", info.message);
        if (info.reason !== "destroyed") {
            initWebGPU(); // Attempt to reinitialize
        }
    });
 
    return { adapter, device };
}

Canvas Configuration

Connecting WebGPU to a Canvas

const canvas  = document.querySelector("canvas");
const context = canvas.getContext("webgpu");
 
// Choose the best format for the screen (usually 'bgra8unorm' on desktop)
const preferredFormat = navigator.gpu.getPreferredCanvasFormat();
 
context.configure({
    device:    device,
    format:    preferredFormat,          // Pixel format
    alphaMode: "opaque",                 // No window transparency
    usage:     GPUTextureUsage.RENDER_ATTACHMENT | GPUTextureUsage.COPY_SRC,
});
 
// Each frame: get the texture to render into (equivalent of swapchain image acquire)
const currentTexture = context.getCurrentTexture();
const currentView    = currentTexture.createView();

WGSL Shader Language

WGSL — WebGPU Shading Language

  • WGSL is statically typed, Rust-like, and cross-compiles to SPIR-V (Linux), HLSL (Windows), and MSL (Mac) internally by the browser.
WGSL TypeEquivalent in GLSLDescription
f32float32-bit float
i32int32-bit signed int
u32uint32-bit unsigned int
vec2<f32>vec22-component float vector
vec3<f32>vec33-component float vector
vec4<f32>vec44-component float vector
mat4x4<f32>mat44x4 float matrix
array<f32, N>float arr[N]Fixed-size array
array<Vertex>SSBO arrayDynamic-size array (in storage buffer)
boolboolBoolean

The Triangle Shader (WGSL)

// Uniforms (Struct + binding)
struct Uniforms {
    modelMatrix      : mat4x4<f32>,
    viewProjMatrix   : mat4x4<f32>,
    cameraPosition   : vec3<f32>,
    time             : f32,
};
 
@group(0) @binding(0) var<uniform> uniforms : Uniforms;
@group(0) @binding(1) var albedoTexture : texture_2d<f32>;
@group(0) @binding(2) var texSampler    : sampler;
 
// Vertex → Fragment struct
struct VertexOutput {
    @builtin(position) position : vec4<f32>,  // Required: clip-space position
    @location(0) worldPos  : vec3<f32>,
    @location(1) normal    : vec3<f32>,
    @location(2) uv        : vec2<f32>,
};
 
// Vertex shader — @vertex marks entry point
@vertex
fn vs_main(
    @location(0) position : vec3<f32>,  // From vertex buffer attribute 0
    @location(1) normal   : vec3<f32>,  // From vertex buffer attribute 1
    @location(2) uv       : vec2<f32>,  // From vertex buffer attribute 2
) -> VertexOutput {
    var out : VertexOutput;
 
    let worldPos    = uniforms.modelMatrix * vec4<f32>(position, 1.0);
    out.position    = uniforms.viewProjMatrix * worldPos;
    out.worldPos    = worldPos.xyz;
    out.normal      = (uniforms.modelMatrix * vec4<f32>(normal, 0.0)).xyz;
    out.uv          = uv;
 
    return out;
}
 
// Fragment shader — @fragment marks entry point
@fragment
fn fs_main(in : VertexOutput) -> @location(0) vec4<f32> {
    let albedo  = textureSample(albedoTexture, texSampler, in.uv);
    let normal  = normalize(in.normal);
    let lightDir = normalize(vec3<f32>(1.0, 2.0, -1.0));
 
    let ndotl = max(dot(normal, lightDir), 0.0);
    let result = albedo.rgb * (ndotl + 0.1);
 
    return vec4<f32>(result, albedo.a);
}

Compute Shader (WGSL)

// WGSL Compute Shader — particle simulation
 
struct Particle {
    position : vec2<f32>,
    velocity : vec2<f32>,
    color    : vec4<f32>,
    lifetime : f32,
    _pad     : vec3<f32>, // 16-byte alignment padding
};
 
// Storage buffers
@group(0) @binding(0) var<storage, read>       particlesIn  : array<Particle>;
@group(0) @binding(1) var<storage, read_write> particlesOut : array<Particle>;
 
// Uniform for time delta
@group(0) @binding(2) var<uniform> deltaTime : f32;
 
// @compute marks this as a compute shader
// @workgroup_size(x, y, z) — 64 threads per workgroup
@compute @workgroup_size(64, 1, 1)
fn cs_main(
    @builtin(global_invocation_id) globalID : vec3<u32>   // Thread position in grid
) {
    let index = globalID.x;
    let total = arrayLength(&particlesIn);
 
    if (index >= total) { return; } // Guard: don't run on extra threads
 
    var p = particlesIn[index];
 
    // Integrate velocity → position
    p.velocity  += vec2<f32>(0.0, -9.8) * deltaTime;
    p.position  += p.velocity * deltaTime;
    p.lifetime  -= deltaTime;
 
    // Respawn when lifetime expires
    if (p.lifetime <= 0.0) {
        p.position = vec2<f32>(0.0);
        p.velocity = vec2<f32>(0.0, 5.0);
        p.lifetime = 2.0;
    }
 
    particlesOut[index] = p;
}

Buffers

Buffer Types and Usages

GPUBufferUsage flagPurpose
VERTEXVertex attribute data
INDEXTriangle index data
UNIFORMSmall, per-frame constant data read in shaders
STORAGELarge read/write GPU arrays (compute)
COPY_SRCCan be copied FROM (staging source)
COPY_DSTCan be copied TO (staging destination, GPU target)
MAP_READCPU can map this after GPU writes to read results back
MAP_WRITECPU can map this to write data in, then copy to GPU
INDIRECTUsed as argument buffer for indirect draw/dispatch

Creating Buffers

// ---- Create a Vertex Buffer ----
const vertices = new Float32Array([
//   x,     y,     z,     nx,   ny,   nz,    u,    v
    -0.5, -0.5,  0.0,   0.0,  0.0,  1.0,  0.0,  0.0,
     0.5, -0.5,  0.0,   0.0,  0.0,  1.0,  1.0,  0.0,
     0.0,  0.5,  0.0,   0.0,  0.0,  1.0,  0.5,  1.0,
]);
 
const vertexBuffer = device.createBuffer({
    label:              "Vertex Buffer",
    size:               vertices.byteLength,
    usage:              GPUBufferUsage.VERTEX | GPUBufferUsage.COPY_DST,
    mappedAtCreation:   false,
});
 
// Upload data via device.queue.writeBuffer (the easy way)
device.queue.writeBuffer(vertexBuffer, 0, vertices);
 
// ---- Create a Uniform Buffer (mapped persistently) ----
const uniformBuffer = device.createBuffer({
    label: "Per-Frame Uniforms",
    size:  256,    // Always multiple of 256 for uniforms
    usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
});
 
// Update every frame
function updateUniforms(modelMatrix, viewProjMatrix) {
    const data = new Float32Array([
        ...modelMatrix,     // 16 floats
        ...viewProjMatrix,  // 16 floats
    ]);
    device.queue.writeBuffer(uniformBuffer, 0, data);
}
 
// ---- Create a Storage Buffer for Compute ----
const PARTICLE_COUNT = 100000;
const PARTICLE_STRIDE = 32; // bytes per particle
 
const particleBufferA = device.createBuffer({
    label: "Particle Buffer A",
    size:  PARTICLE_COUNT * PARTICLE_STRIDE,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.COPY_DST,
});
const particleBufferB = device.createBuffer({
    label: "Particle Buffer B",
    size:  PARTICLE_COUNT * PARTICLE_STRIDE,
    usage: GPUBufferUsage.STORAGE | GPUBufferUsage.VERTEX, // Render from compute output!
});

Reading Data Back from GPU (Readback)

// Create a buffer the CPU can read from
const readbackBuffer = device.createBuffer({
    size:  computeResultSize,
    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
});
 
// After dispatch — copy compute output to readback buffer
const encoder = device.createCommandEncoder();
encoder.copyBufferToBuffer(computeOutputBuffer, 0, readbackBuffer, 0, computeResultSize);
device.queue.submit([encoder.finish()]);
 
// Map the buffer for CPU reading (async — waits for GPU to finish)
await readbackBuffer.mapAsync(GPUMapMode.READ);
const data = new Float32Array(readbackBuffer.getMappedRange());
console.log("GPU Result:", data[0], data[1], data[2]);
readbackBuffer.unmap(); // Unmap before GPU can use it again

Textures

Creating and Uploading Textures

// Create a texture
const texture = device.createTexture({
    label:     "Albedo Texture",
    size:      [1024, 1024, 1],   // width, height, depthOrArrayLayers
    format:    "rgba8unorm-srgb",  // 8-bit sRGB
    usage:     GPUTextureUsage.TEXTURE_BINDING   // Read in shader
              | GPUTextureUsage.COPY_DST          // Can receive uploads
              | GPUTextureUsage.RENDER_ATTACHMENT, // Can be render target
    mipLevelCount: Math.floor(Math.log2(1024)) + 1,
});
 
// Upload image data
async function loadImageToTexture(device, url) {
    const response = await fetch(url);
    const blob     = await response.blob();
    const image    = await createImageBitmap(blob, { colorSpaceConversion: "none" });
 
    const texture = device.createTexture({
        size:   [image.width, image.height, 1],
        format: "rgba8unorm-srgb",
        usage:  GPUTextureUsage.TEXTURE_BINDING | GPUTextureUsage.COPY_DST | GPUTextureUsage.RENDER_ATTACHMENT,
    });
 
    // The easiest way — browser handles decoding
    device.queue.copyExternalImageToTexture(
        { source: image },
        { texture: texture },
        [image.width, image.height]
    );
 
    return texture;
}
 
// Create sampler
const sampler = device.createSampler({
    label:         "Linear Wrap Sampler",
    magFilter:     "linear",   // Zoomed in
    minFilter:     "linear",   // Zoomed out
    mipmapFilter:  "linear",   // Between mip levels
    addressModeU:  "repeat",   // Tile horizontally
    addressModeV:  "repeat",   // Tile vertically
    maxAnisotropy: 16,         // Quality anisotropic filtering
});

Bind Groups

What Are Bind Groups?

  • In WebGPU, you do not bind resources one by one. You group them into a GPUBindGroup — a snapshot of exactly which buffers, textures, and samplers are bound at specific slots.
graph TD
    BGL["GPUBindGroupLayout\n'Schema': Binding 0 = Uniform Buffer\nBinding 1 = Texture2D\nBinding 2 = Sampler"]
    BG["GPUBindGroup\n'Instance': Binding 0 = myUniformBuffer\nBinding 1 = playerTexture\nBinding 2 = linearSampler"]
    Pipeline["GPURenderPipeline / GPUComputePipeline\nCreated with the same layout"]

    BGL --> BG
    BGL --> Pipeline
    BG -->|"setBindGroup(0, myBG)"| Pass["Render/Compute Pass"]

Creating BindGroupLayoutS and BindGroups

// Step 1: Define the layout (the schema)
const bindGroupLayout = device.createBindGroupLayout({
    label: "Main Bind Group Layout",
    entries: [
        // Binding 0: Uniform buffer, visible to vertex AND fragment shaders
        {
            binding:    0,
            visibility: GPUShaderStage.VERTEX | GPUShaderStage.FRAGMENT,
            buffer: {
                type:           "uniform",
                minBindingSize: 256,
            }
        },
        // Binding 1: Texture (read-only in shader)
        {
            binding:    1,
            visibility: GPUShaderStage.FRAGMENT,
            texture: {
                sampleType:    "float",
                viewDimension: "2d",
                multisampled:  false,
            }
        },
        // Binding 2: Sampler
        {
            binding:    2,
            visibility: GPUShaderStage.FRAGMENT,
            sampler: {
                type: "filtering",
            }
        },
    ]
});
 
// Step 2: Create the bind group (actual data)
const bindGroup = device.createBindGroup({
    label:  "Main Bind Group",
    layout: bindGroupLayout,
    entries: [
        { binding: 0, resource: { buffer: uniformBuffer } },
        { binding: 1, resource: texture.createView() },
        { binding: 2, resource: sampler },
    ]
});

Render Pipelines

Creating the Render Pipeline

const shaderModule = device.createShaderModule({
    label: "Main Shader",
    code:  WGSL_SHADER_CODE, // The WGSL string with vs_main and fs_main
});
 
const pipeline = device.createRenderPipeline({
    label:  "Main Render Pipeline",
    layout: device.createPipelineLayout({
        bindGroupLayouts: [bindGroupLayout]
    }),
 
    vertex: {
        module:     shaderModule,
        entryPoint: "vs_main",
        buffers: [{
            arrayStride: 8 * 4, // 8 floats × 4 bytes = 32 bytes per vertex
            stepMode:    "vertex",
            attributes: [
                { shaderLocation: 0, offset: 0,  format: "float32x3" }, // position
                { shaderLocation: 1, offset: 12, format: "float32x3" }, // normal
                { shaderLocation: 2, offset: 24, format: "float32x2" }, // uv
            ]
        }]
    },
 
    fragment: {
        module:     shaderModule,
        entryPoint: "fs_main",
        targets: [{
            format: navigator.gpu.getPreferredCanvasFormat(),
            // Alpha blending:
            // blend: {
            //     color: { srcFactor: "src-alpha", dstFactor: "one-minus-src-alpha", operation: "add" },
            //     alpha: { srcFactor: "one",       dstFactor: "zero",                operation: "add" },
            // }
        }]
    },
 
    primitive: {
        topology:         "triangle-list",
        cullMode:         "back",
        frontFace:        "ccw",
        stripIndexFormat: undefined,
    },
 
    depthStencil: {
        format:              "depth24plus",
        depthWriteEnabled:   true,
        depthCompare:        "less",
    },
});

The Complete Render Loop

Drawing Every Frame

function drawFrame(timestamp) {
    // Update uniform buffer with new matrices
    updateUniforms(computeModelMatrix(timestamp), viewProjMatrix);
 
    // ==== Create command encoder ====
    const encoder = device.createCommandEncoder({ label: "Frame Encoder" });
 
    // ==== Render Pass ====
    const renderPass = encoder.beginRenderPass({
        label: "Main Render Pass",
        colorAttachments: [{
            view:       context.getCurrentTexture().createView(), // Render to screen
            clearValue: { r: 0.05, g: 0.05, b: 0.1, a: 1.0 },   // Background color
            loadOp:     "clear",                                   // Clear before drawing
            storeOp:    "store",                                   // Save to screen
        }],
        depthStencilAttachment: {
            view:              depthTexture.createView(),
            depthClearValue:   1.0,  // Far plane
            depthLoadOp:       "clear",
            depthStoreOp:      "discard",  // Discard depth after render
        }
    });
 
    // ==== Set up pipeline and resources ====
    renderPass.setPipeline(pipeline);
    renderPass.setBindGroup(0, bindGroup);            // Uniforms + texture + sampler
    renderPass.setVertexBuffer(0, vertexBuffer);      // Vertex data at slot 0
    renderPass.setIndexBuffer(indexBuffer, "uint32"); // Uint32 index buffer
 
    // ==== Draw ====
    renderPass.drawIndexed(indexCount, 1, 0, 0, 0);
    // (indexCount, instanceCount, firstIndex, baseVertex, firstInstance)
 
    renderPass.end();
 
    // ==== Submit ====
    device.queue.submit([encoder.finish()]);
 
    // Register next frame
    requestAnimationFrame(drawFrame);
}
 
requestAnimationFrame(drawFrame);

Compute Pipelines

Creating and Dispatching Compute

// Create compute pipeline
const computePipeline = device.createComputePipeline({
    label:  "Particle Compute Pipeline",
    layout: device.createPipelineLayout({
        bindGroupLayouts: [computeBindGroupLayout]
    }),
    compute: {
        module:     shaderModule,
        entryPoint: "cs_main",
    }
});
 
// ---- Per frame: dispatch compute, then render ----
const encoder = device.createCommandEncoder();
 
// COMPUTE PASS (simulate particles)
const computePass = encoder.beginComputePass({ label: "Particle Update" });
computePass.setPipeline(computePipeline);
computePass.setBindGroup(0, computeBindGroupEven); // Ping-pong buffers
 
const WORKGROUP_SIZE = 64;
const dispatchCount  = Math.ceil(PARTICLE_COUNT / WORKGROUP_SIZE);
computePass.dispatchWorkgroups(dispatchCount, 1, 1);
computePass.end();
 
// RENDER PASS (draw updated particles)
const renderPass = encoder.beginRenderPass({ /* ... */ });
renderPass.setPipeline(renderPipeline);
renderPass.setVertexBuffer(0, particleBufferB); // Use compute output as vertex data!
renderPass.draw(PARTICLE_COUNT);
renderPass.end();
 
device.queue.submit([encoder.finish()]);
 
// Swap buffers for next frame (ping-pong)
[computeBindGroupEven, computeBindGroupOdd] = [computeBindGroupOdd, computeBindGroupEven];

Timestamps and Performance

Measuring GPU Time

// Timestamp queries require requesting the feature at device creation
const device = await adapter.requestDevice({
    requiredFeatures: ["timestamp-query"]
});
 
// Create query set
const querySet = device.createQuerySet({
    type:  "timestamp",
    count: 4, // 4 timestamps: startCompute, endCompute, startRender, endRender
});
 
const resolveBuffer = device.createBuffer({
    size:  4 * 8, // 4 timestamps × 8 bytes (uint64)
    usage: GPUBufferUsage.QUERY_RESOLVE | GPUBufferUsage.COPY_SRC,
});
 
const readbackBuffer = device.createBuffer({
    size:  4 * 8,
    usage: GPUBufferUsage.COPY_DST | GPUBufferUsage.MAP_READ,
});
 
const encoder = device.createCommandEncoder();
 
const computePass = encoder.beginComputePass({
    timestampWrites: {
        querySet:                    querySet,
        beginningOfPassWriteIndex:   0,  // Write timestamp 0 at start
        endOfPassWriteIndex:         1,  // Write timestamp 1 at end
    }
});
// ... compute dispatch ...
computePass.end();
 
encoder.resolveQuerySet(querySet, 0, 4, resolveBuffer, 0);
encoder.copyBufferToBuffer(resolveBuffer, 0, readbackBuffer, 0, 4 * 8);
 
device.queue.submit([encoder.finish()]);
 
await readbackBuffer.mapAsync(GPUMapMode.READ);
const timestamps = new BigUint64Array(readbackBuffer.getMappedRange());
 
const computeTimeNs = Number(timestamps[1] - timestamps[0]);
console.log(`Compute pass: ${computeTimeNs / 1_000_000} ms`);
readbackBuffer.unmap();

WebGPU Native (Dawn and wgpu)

Beyond the Browser

  • WebGPU isn’t only for browsers. You can use the exact same API in desktop native apps:
ImplementationLanguagePlatformOwned By
DawnC++Windows, macOS, Linux, Android, iOSGoogle (powers Chrome)
wgpuRustWindows, macOS, Linux, Android, iOS, WebMozilla / wgpu contributors
wgpu-nativeC FFISame as wgpuwgpu project
WebGPU in Node.jsJavaScriptWindows, macOS, LinuxCommunity

Using wgpu in Rust

// Cargo.toml
// [dependencies]
// wgpu = "22"
// winit = "0.30"
// pollster = "0.3"   // For blocking async
 
use wgpu::*;
 
async fn init_wgpu(window: &winit::window::Window) -> (Device, Queue, Surface) {
    let instance = Instance::new(InstanceDescriptor {
        backends: Backends::all(), // Vulkan on Linux, DX12 on Windows, Metal on Mac
        ..Default::default()
    });
 
    let surface = instance.create_surface(window).unwrap();
 
    let adapter = instance.request_adapter(&RequestAdapterOptions {
        power_preference:       PowerPreference::HighPerformance,
        compatible_surface:     Some(&surface),
        force_fallback_adapter: false,
    }).await.unwrap();
 
    let (device, queue) = adapter.request_device(
        &DeviceDescriptor {
            label:              Some("Main Device"),
            required_features:  Features::empty(),
            required_limits:    Limits::default(),
        },
        None,
    ).await.unwrap();
 
    (device, queue, surface)
}
// The rest of the API matches JavaScript WebGPU 1:1!
// device.create_buffer(), device.create_render_pipeline(), etc.

Complete Object Reference

Every WebGPU Object Explained

WebGPU ObjectCategoryWhat It Does
GPUAdapterBootstrapRepresents a physical GPU. Query capabilities here.
GPUDeviceCoreLogical GPU connection. Create everything from here.
GPUQueueExecutionSubmit command buffers and write buffer data.
GPUBufferMemoryA block of GPU memory (vertex, index, uniform, storage).
GPUTextureMemoryGPU image (2D, 3D, cube map, array).
GPUTextureViewMemoryHow to interpret a texture (mip level, layer).
GPUSamplerTexturesFiltering and UV wrapping configuration.
GPUShaderModuleShadersCompiled WGSL code.
GPUBindGroupLayoutBindingSchema of which resources go at which bindings.
GPUBindGroupBindingActual resource bindings matching a layout.
GPUPipelineLayoutBindingOrganizes multiple bind group layouts for a pipeline.
GPURenderPipelinePipelineImmutable: vertex + fragment shaders + render state.
GPUComputePipelinePipelineImmutable: compute kernel.
GPUCommandEncoderCommandsRecords render passes and compute passes.
GPURenderPassEncoderCommandsRecords draw calls within a render pass.
GPUComputePassEncoderCommandsRecords compute dispatches.
GPURenderBundleEncoderPerfPre-record draw calls for reuse across frames.
GPUQuerySetProfilingOcclusion queries and timestamp queries.

More Learn — Free Resources