graph TD
CPU["🖥️ CPU (Your C++ App)"]
CB["VkCommandBuffer\nRecord commands into this"]
Q["VkQueue\nSubmit command buffers here"]
GPU["🎮 GPU\nExecutes everything asynchronously"]
Fence["VkFence\nCPU waits on this to know GPU is done"]
CPU -->|"vkBeginCommandBuffer()"| CB
CB -->|"vkEndCommandBuffer()"| Q
Q -->|"vkQueueSubmit()"| GPU
GPU -->|"signals when done"| Fence
Fence -->|"vkWaitForFences()"| CPU
1 — Prerequisites and Setup
What You Must Know Before Starting
✔ C++ (classes, RAII, smart pointers, move semantics)
✔ Basic linear algebra (vectors, matrices, dot product)
✔ What a shader is (vertex transforms positions, fragment colors pixels)
✔ What a frame buffer is (a block of pixels rendered to before display)
The VkInstance is the very first thing you create. It is the bridge between your application and the Vulkan library. Think of it as “telling Vulkan: I exist, these are my requirements, and these are the layers I want for debugging.”
Vulkan Extensions
Extensions are optional features added on top of core Vulkan. Common ones:
Extension
Why you need it
VK_KHR_surface
Required to show output on a window
VK_KHR_win32_surface
Windows-specific surface support
VK_EXT_debug_utils
Enables human-readable debug messages
VK_KHR_ray_tracing_pipeline
Hardware ray tracing
VK_KHR_swapchain
Required to create a swapchain (present to screen)
Validation Layers — Your Best Friend
By default, Vulkan does zero error checking for performance. Validation Layers are a separate debug middleware that intercept every API call and check for mistakes:
VK_LAYER_KHRONOS_validation catches:
✔ Using a destroyed object
✔ Forgetting to synchronize resources before use
✔ Passing invalid parameters
✔ Image layout transitions done in wrong order
✔ Memory leaks
Creating the Instance
#define GLFW_INCLUDE_VULKAN#include <GLFW/glfw3.h>#include <vector>#include <stdexcept>// The debug validation layers we want (debug only)const std::vector<const char*> validationLayers = { "VK_LAYER_KHRONOS_validation"};void createInstance(VkInstance& instance) { // -- Step 1: Describe your application -- VkApplicationInfo appInfo{}; appInfo.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO; appInfo.pApplicationName = "My Vulkan Game"; appInfo.applicationVersion = VK_MAKE_VERSION(1, 0, 0); appInfo.pEngineName = "My Engine"; appInfo.engineVersion = VK_MAKE_VERSION(1, 0, 0); appInfo.apiVersion = VK_API_VERSION_1_3; // Use Vulkan 1.3 // -- Step 2: Get required extensions from GLFW -- uint32_t glfwExtensionCount = 0; const char** glfwExtensions = glfwGetRequiredInstanceExtensions(&glfwExtensionCount); std::vector<const char*> extensions(glfwExtensions, glfwExtensions + glfwExtensionCount); extensions.push_back(VK_EXT_DEBUG_UTILS_EXTENSION_NAME); // For debug messages // -- Step 3: Fill in the creation info -- VkInstanceCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO; createInfo.pApplicationInfo = &appInfo; createInfo.enabledExtensionCount = (uint32_t)extensions.size(); createInfo.ppEnabledExtensionNames = extensions.data();#ifdef _DEBUG createInfo.enabledLayerCount = (uint32_t)validationLayers.size(); createInfo.ppEnabledLayerNames = validationLayers.data();#else createInfo.enabledLayerCount = 0;#endif // -- Step 4: Create it! -- if (vkCreateInstance(&createInfo, nullptr, &instance) != VK_SUCCESS) { throw std::runtime_error("Failed to create Vulkan instance!"); }}
3 — Physical Device (Picking a GPU)
What Is a Physical Device?
VkPhysicalDevice represents a real GPU in the machine (NVIDIA RTX 4090, AMD RX 7900, Intel Arc, etc.). You do NOT create it — you enumerate (list) what’s available and pick the best.
// Instead of just picking ANY GPU, score them and pick the bestint rateDevice(VkPhysicalDevice device) { VkPhysicalDeviceProperties props; vkGetPhysicalDeviceProperties(device, &props); int score = 0; // Dedicated GPUs score much higher if (props.deviceType == VK_PHYSICAL_DEVICE_TYPE_DISCRETE_GPU) score += 1000; // More VRAM = better VkPhysicalDeviceMemoryProperties memProps; vkGetPhysicalDeviceMemoryProperties(device, &memProps); for (uint32_t i = 0; i < memProps.memoryHeapCount; i++) { if (memProps.memoryHeaps[i].flags & VK_MEMORY_HEAP_DEVICE_LOCAL_BIT) score += (int)(memProps.memoryHeaps[i].size / (1024 * 1024)); // MB counts } // Max texture size bonus score += props.limits.maxImageDimension2D / 1000; return score;}
4 — Queue Families
Understanding GPU Queues
GPUs don’t have one single “do everything” interface. They expose Queue Families — specialized hardware paths for different types of work.
Queue Type
What it can do
Hardware Example
Graphics
Draw, Compute, Transfer
NVIDIA’s Universal queue
Compute
Compute only — async compute
AMD’s Async Compute Engine
Transfer
Fast memory copies
DMA unit
Present
Present frames to a window surface
Usually the same as Graphics
Finding Queue Family Indices
struct QueueFamilyIndices { std::optional<uint32_t> graphicsFamily; std::optional<uint32_t> presentFamily; bool isComplete() const { return graphicsFamily.has_value() && presentFamily.has_value(); }};QueueFamilyIndices findQueueFamilies(VkPhysicalDevice device, VkSurfaceKHR surface) { QueueFamilyIndices indices; uint32_t queueFamilyCount = 0; vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, nullptr); std::vector<VkQueueFamilyProperties> queueFamilies(queueFamilyCount); vkGetPhysicalDeviceQueueFamilyProperties(device, &queueFamilyCount, queueFamilies.data()); for (uint32_t i = 0; i < queueFamilies.size(); i++) { // Check if this family can do graphics if (queueFamilies[i].queueFlags & VK_QUEUE_GRAPHICS_BIT) indices.graphicsFamily = i; // Check if this family can present (show) to our window surface VkBool32 presentSupport = false; vkGetPhysicalDeviceSurfaceSupportKHR(device, i, surface, &presentSupport); if (presentSupport) indices.presentFamily = i; if (indices.isComplete()) break; } return indices;}
5 — Logical Device (VkDevice)
What Is a Logical Device?
The VkDevice is your application’s handle to the GPU. Everything you create after this (buffers, pipelines, images) belongs to this logical device.
Physical Device = The physical hardware that exists in your computer.
Logical Device = Your application’s view of that hardware. You can create multiple logical devices from one physical device (e.g., for different “tenants” in a cloud GPU server).
Creating the Logical Device
void createLogicalDevice(VkPhysicalDevice physicalDevice, QueueFamilyIndices indices, VkDevice& device, VkQueue& graphicsQueue) { float queuePriority = 1.0f; // 1.0 = highest priority (range: 0.0 to 1.0) VkDeviceQueueCreateInfo queueCreateInfo{}; queueCreateInfo.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO; queueCreateInfo.queueFamilyIndex = indices.graphicsFamily.value(); queueCreateInfo.queueCount = 1; queueCreateInfo.pQueuePriorities = &queuePriority; // Request specific GPU features (must be supported by physical device) VkPhysicalDeviceFeatures deviceFeatures{}; deviceFeatures.samplerAnisotropy = VK_TRUE; // For texture filtering deviceFeatures.fillModeNonSolid = VK_TRUE; // For wireframe rendering // Device extensions we need (swapchain lets us display to a window) const std::vector<const char*> deviceExtensions = { VK_KHR_SWAPCHAIN_EXTENSION_NAME }; VkDeviceCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO; createInfo.pQueueCreateInfos = &queueCreateInfo; createInfo.queueCreateInfoCount = 1; createInfo.pEnabledFeatures = &deviceFeatures; createInfo.enabledExtensionCount = (uint32_t)deviceExtensions.size(); createInfo.ppEnabledExtensionNames = deviceExtensions.data(); if (vkCreateDevice(physicalDevice, &createInfo, nullptr, &device) != VK_SUCCESS) { throw std::runtime_error("Failed to create logical device!"); } // Retrieve the handle for the queue we just created vkGetDeviceQueue(device, indices.graphicsFamily.value(), 0, &graphicsQueue);}
6 — Window Surface (VkSurfaceKHR)
Why a Surface?
Vulkan is platform-agnostic. It knows nothing about Windows, Linux, or macOS windows. A VkSurfaceKHR is the bridge between Vulkan and your windowing system (Win32, X11, Wayland, Cocoa).
GLFW abstracts this for us in one call:
VkSurfaceKHR surface;// GLFW handles the platform-specific surface creation for youif (glfwCreateWindowSurface(instance, window, nullptr, &surface) != VK_SUCCESS) { throw std::runtime_error("Failed to create window surface!");}
The swapchain is an array of images waiting to be rendered to and then displayed on the monitor. The GPU renders into one image while the others are being shown or waiting.
graph LR
subgraph Swapchain
I0["Image 0\n🟢 On Screen Right Now"]
I1["Image 1\n🟡 Waiting (V-Sync)"]
I2["Image 2\n🔵 GPU Rendering Here"]
end
I2 -->|"becomes ready"| I1
I1 -->|"V-Sync swaps"| I0
Presentation Modes
Mode
Behavior
Tearing?
Latency
IMMEDIATE
GPU presents as fast as possible
Yes
Lowest
FIFO
Standard V-Sync — wait for monitor refresh
No
Moderate
FIFO_RELAXED
V-Sync but skips if late
Sometimes
Moderate
MAILBOX
Triple buffering — replaces unshown frames
No
Best of both
Swapchain Configuration
When creating the swapchain, you must choose a surface format (color depth) and extent (resolution).
// Query what the surface/GPU supportsVkSurfaceCapabilitiesKHR caps;vkGetPhysicalDeviceSurfaceCapabilitiesKHR(physicalDevice, surface, &caps);VkSwapchainCreateInfoKHR createInfo{};createInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;createInfo.surface = surface;createInfo.minImageCount = 3; // Triple bufferingcreateInfo.imageFormat = VK_FORMAT_B8G8R8A8_SRGB; // 8-bit BGRA in sRGB color spacecreateInfo.imageColorSpace = VK_COLOR_SPACE_SRGB_NONLINEAR_KHR;createInfo.imageExtent = { windowWidth, windowHeight };createInfo.imageArrayLayers = 1; // 2 for stereoscopic 3D (VR)createInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT; // We render to itcreateInfo.preTransform = caps.currentTransform; // Usually IDENTITYcreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR; // No window transparencycreateInfo.presentMode = VK_PRESENT_MODE_MAILBOX_KHR;createInfo.clipped = VK_TRUE; // Don't render pixels hidden behind other windowscreateInfo.oldSwapchain = VK_NULL_HANDLE;VkSwapchainKHR swapChain;vkCreateSwapchainKHR(device, &createInfo, nullptr, &swapChain);// Retreive the actual VkImage handlesuint32_t imageCount;vkGetSwapchainImagesKHR(device, swapChain, &imageCount, nullptr);std::vector<VkImage> swapChainImages(imageCount);vkGetSwapchainImagesKHR(device, swapChain, &imageCount, swapChainImages.data());
8 — Image Views (VkImageView)
What Is an Image View?
A VkImage is raw GPU memory — a block of VRAM. An VkImageView is a lens that tells Vulkan how to interpret that memory:
“Is this 2D texture? A cube map? Use only mip-levels 2-5? Look at the red channel only?”
You cannot use a VkImage directly. You always go through a VkImageView.
// Create one VkImageView for each VkImage in the swapchainstd::vector<VkImageView> swapChainImageViews(swapChainImages.size());for (size_t i = 0; i < swapChainImages.size(); i++) { VkImageViewCreateInfo createInfo{}; createInfo.sType = VK_STRUCTURE_TYPE_IMAGE_VIEW_CREATE_INFO; createInfo.image = swapChainImages[i]; createInfo.viewType = VK_IMAGE_VIEW_TYPE_2D; // Interpret as a 2D texture createInfo.format = VK_FORMAT_B8G8R8A8_SRGB; // Same as swapchain format // How to map RGBA channels (here: R->R, G->G, B->B, A->A, no swizzling) createInfo.components.r = VK_COMPONENT_SWIZZLE_IDENTITY; createInfo.components.g = VK_COMPONENT_SWIZZLE_IDENTITY; createInfo.components.b = VK_COMPONENT_SWIZZLE_IDENTITY; createInfo.components.a = VK_COMPONENT_SWIZZLE_IDENTITY; // Which parts of the image to access (mip levels, array layers) createInfo.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; createInfo.subresourceRange.baseMipLevel = 0; createInfo.subresourceRange.levelCount = 1; createInfo.subresourceRange.baseArrayLayer = 0; createInfo.subresourceRange.layerCount = 1; vkCreateImageView(device, &createInfo, nullptr, &swapChainImageViews[i]);}
9 — Render Passes (VkRenderPass)
Why Does a Render Pass Exist?
A Render Pass tells Vulkan EVERYTHING about the rendering before it happens. This allows the GPU driver to pre-plan memory layout and tiling optimizations (especially on mobile).
A Render Pass defines:
✔ WHAT attachments exist (color buffer, depth buffer)
✔ WHAT FORMAT those attachments are (RGBA8, D32_SFLOAT)
✔ HOW to load them at the start (Clear? Load previous? Don't care?)
✔ HOW to store them at the end (Save to memory? Discard?)
✔ WHAT layout the attachment is in at the start and end
Creating a Render Pass
// ------ Color Attachment (the final rendered image) ------VkAttachmentDescription colorAttachment{};colorAttachment.format = VK_FORMAT_B8G8R8A8_SRGB; // Must match swapchain formatcolorAttachment.samples = VK_SAMPLE_COUNT_1_BIT; // No MSAA for nowcolorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR; // Clear to black before drawingcolorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE; // Save result (we need to show it)colorAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;colorAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;colorAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; // We don't care about previous contentcolorAttachment.finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR; // Ready to display on screen// ------ Depth Attachment (for Z-buffer depth testing) ------VkAttachmentDescription depthAttachment{};depthAttachment.format = VK_FORMAT_D32_SFLOAT; // 32-bit float depthdepthAttachment.samples = VK_SAMPLE_COUNT_1_BIT;depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; // Don't save depth after renderingdepthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE;depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE;depthAttachment.initialLayout = VK_IMAGE_LAYOUT_UNDEFINED;depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// ------ Subpass reference ------VkAttachmentReference colorRef{};colorRef.attachment = 0; // Index 0 = colorAttachmentcolorRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;VkAttachmentReference depthRef{};depthRef.attachment = 1; // Index 1 = depthAttachmentdepthRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;VkSubpassDescription subpass{};subpass.pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS;subpass.colorAttachmentCount = 1;subpass.pColorAttachments = &colorRef;subpass.pDepthStencilAttachment = &depthRef;// ------ Subpass dependency (ensures layout transitions are done correctly) ------VkSubpassDependency dependency{};dependency.srcSubpass = VK_SUBPASS_EXTERNAL; // Before the render passdependency.dstSubpass = 0;dependency.srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;dependency.srcAccessMask = 0;dependency.dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT | VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT;dependency.dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;// ------ Assemble the Render Pass ------std::array<VkAttachmentDescription, 2> attachments = {colorAttachment, depthAttachment};VkRenderPassCreateInfo renderPassInfo{};renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO;renderPassInfo.attachmentCount = (uint32_t)attachments.size();renderPassInfo.pAttachments = attachments.data();renderPassInfo.subpassCount = 1;renderPassInfo.pSubpasses = &subpass;renderPassInfo.dependencyCount = 1;renderPassInfo.pDependencies = &dependency;VkRenderPass renderPass;vkCreateRenderPass(device, &renderPassInfo, nullptr, &renderPass);
10 — Shaders and SPIR-V
The Vulkan Shader Pipeline
Vulkan does NOT accept GLSL or HLSL source code directly. It only accepts SPIR-V — a compiled binary intermediate format. You compile GLSL → SPIR-V using the glslc compiler (included in the Vulkan SDK).
# Compile vertex shaderglslc shader.vert -o vert.spv# Compile fragment shaderglslc shader.frag -o frag.spv# Compile HLSL to SPIR-V for Vulkandxc -spirv -T vs_6_6 -E VSMain shader.hlsl -Fo vert.spv
Writing a Vertex Shader (GLSL)
// shader.vert#version 450// Vertex input attributes (from VkVertexInputAttributeDescription)layout(location = 0) in vec3 inPosition;layout(location = 1) in vec3 inColor;layout(location = 2) in vec2 inTexCoord;// Outputs to the fragment shaderlayout(location = 0) out vec3 fragColor;layout(location = 1) out vec2 fragTexCoord;// Uniform Buffer Object — shared data from CPU (e.g., matrices)layout(binding = 0) uniform UniformBufferObject { mat4 model; // Transform: local space → world space mat4 view; // Transform: world space → camera space mat4 proj; // Transform: camera space → clip space} ubo;void main() { // gl_Position is the built-in clip-space output gl_Position = ubo.proj * ubo.view * ubo.model * vec4(inPosition, 1.0); fragColor = inColor; fragTexCoord = inTexCoord;}
Writing a Fragment Shader (GLSL)
// shader.frag#version 450// Received from vertex shaderlayout(location = 0) in vec3 fragColor;layout(location = 1) in vec2 fragTexCoord;// Texture and sampler (from descriptor set)layout(binding = 1) uniform sampler2D texSampler;// Output: the final pixel colorlayout(location = 0) out vec4 outColor;void main() { // Sample the texture at the UV coordinate, multiply with vertex color outColor = texture(texSampler, fragTexCoord) * vec4(fragColor, 1.0);}
A VkFramebuffer is the collection of ImageViews used as render targets for a specific Render Pass. It connects the Render Pass (which describes attachment formats) to actual VkImageView objects (which hold actual pixel data).
You need one framebuffer per swapchain image:
swapChainFramebuffers.resize(swapChainImageViews.size());for (size_t i = 0; i < swapChainImageViews.size(); i++) { // The framebuffer binds the color AND depth attachments std::array<VkImageView, 2> attachments = { swapChainImageViews[i], // Attachment 0: color depthImageView // Attachment 1: depth }; VkFramebufferCreateInfo framebufferInfo{}; framebufferInfo.sType = VK_STRUCTURE_TYPE_FRAMEBUFFER_CREATE_INFO; framebufferInfo.renderPass = renderPass; framebufferInfo.attachmentCount = (uint32_t)attachments.size(); framebufferInfo.pAttachments = attachments.data(); framebufferInfo.width = swapChainExtent.width; framebufferInfo.height = swapChainExtent.height; framebufferInfo.layers = 1; vkCreateFramebuffer(device, &framebufferInfo, nullptr, &swapChainFramebuffers[i]);}
13 — Memory Management and Buffers
Explicit GPU Memory (Staging Buffers)
In Vulkan, uploading a mesh to the GPU is a 4-step dance:
graph LR
CPU["CPU RAM\nstd::vector vertices"]
SB["Staging Buffer\nHOST_VISIBLE memory\n(CPU can write here)"]
VB["Vertex Buffer\nDEVICE_LOCAL memory\n(GPU reads here fast)"]
CPU -->|memcpy| SB
SB -->|vkCmdCopyBuffer| VB
Why not write to DEVICE_LOCAL directly? — The CPU cannot write to pure VRAM. It must go through a CPU-accessible Staging Buffer first.
Creating a Vertex Buffer (Without VMA)
// Helper function to find correct memory type on the GPUuint32_t findMemoryType(VkPhysicalDevice physDev, uint32_t typeFilter, VkMemoryPropertyFlags properties) { VkPhysicalDeviceMemoryProperties memProperties; vkGetPhysicalDeviceMemoryProperties(physDev, &memProperties); for (uint32_t i = 0; i < memProperties.memoryTypeCount; i++) { bool typeMatch = (typeFilter & (1 << i)); bool propMatch = (memProperties.memoryTypes[i].propertyFlags & properties) == properties; if (typeMatch && propMatch) return i; } throw std::runtime_error("Failed to find suitable memory type!");}// Creates any buffer of given size, usage, and memory propertyvoid createBuffer(VkDevice device, VkPhysicalDevice physDev, VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) { VkBufferCreateInfo bufferInfo{}; bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO; bufferInfo.size = size; bufferInfo.usage = usage; bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE; // Only one queue accesses it vkCreateBuffer(device, &bufferInfo, nullptr, &buffer); // Find out how much memory this buffer needs VkMemoryRequirements memRequirements; vkGetBufferMemoryRequirements(device, buffer, &memRequirements); VkMemoryAllocateInfo allocInfo{}; allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO; allocInfo.allocationSize = memRequirements.size; allocInfo.memoryTypeIndex = findMemoryType(physDev, memRequirements.memoryTypeBits, properties); vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory); vkBindBufferMemory(device, buffer, bufferMemory, 0); // Bind memory block to buffer}// Upload vertex data using staging buffervoid createVertexBuffer(std::vector<Vertex>& vertices, VkBuffer& vertexBuffer, VkDeviceMemory& vertexBufferMemory) { VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size(); // 1. Create staging buffer (CPU-accessible) VkBuffer stagingBuffer; VkDeviceMemory stagingBufferMemory; createBuffer(device, physicalDevice, bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory); // 2. Copy vertex data to staging buffer void* data; vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data); memcpy(data, vertices.data(), (size_t)bufferSize); vkUnmapMemory(device, stagingBufferMemory); // 3. Create final vertex buffer (GPU-only, fast VRAM) createBuffer(device, physicalDevice, bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory); // 4. Copy staging → final buffer (GPU copy operation) copyBuffer(stagingBuffer, vertexBuffer, bufferSize); // 5. Clean up staging buffer vkDestroyBuffer(device, stagingBuffer, nullptr); vkFreeMemory(device, stagingBufferMemory, nullptr);}
VMA — Vulkan Memory Allocator (The Professional Way)
// Setup VMA once during initializationVmaAllocatorCreateInfo allocatorInfo{};allocatorInfo.instance = instance;allocatorInfo.physicalDevice = physicalDevice;allocatorInfo.device = device;allocatorInfo.vulkanApiVersion = VK_API_VERSION_1_3;VmaAllocator allocator;vmaCreateAllocator(&allocatorInfo, &allocator);// Create a vertex buffer WITH VMA (much simpler!)VkBufferCreateInfo bufferInfo{ VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO };bufferInfo.size = sizeof(Vertex) * vertexCount;bufferInfo.usage = VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT;VmaAllocationCreateInfo vmaInfo{};vmaInfo.usage = VMA_MEMORY_USAGE_AUTO;vmaInfo.flags = VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT; // Best for large meshesVkBuffer vertexBuffer;VmaAllocation allocation;vmaCreateBuffer(allocator, &bufferInfo, &vmaInfo, &vertexBuffer, &allocation, nullptr);// For CPU-writable uniform buffers:VmaAllocationCreateInfo cpuInfo{};cpuInfo.usage = VMA_MEMORY_USAGE_AUTO;cpuInfo.flags = VMA_ALLOCATION_CREATE_HOST_ACCESS_SEQUENTIAL_WRITE_BIT | VMA_ALLOCATION_CREATE_MAPPED_BIT; // Keeps it persistently mapped
14 — Textures and Images (VkImage)
The Texture Upload Journey
graph TD
PNG["PNG file on disk"]
CPU["stb_image loads pixels into CPU RAM"]
Stage["Staging Buffer\n(HOST_VISIBLE VkBuffer)"]
Transition1["Pipeline Barrier\nUNDEFINED → TRANSFER_DST\n(prepare image to receive GPU copy)"]
Copy["vkCmdCopyBufferToImage\n(GPU copies staging → VkImage)"]
Transition2["Pipeline Barrier\nTRANSFER_DST → SHADER_READ_ONLY\n(prepare image for shader sampling)"]
Sample["Shader samples the texture!"]
PNG --> CPU --> Stage --> Transition1 --> Copy --> Transition2 --> Sample
The CPU and GPU run completely independently. Once you submit a command buffer, the GPU starts working immediately and your CPU code keeps running. Without synchronization, you could:
❌ Start rendering frame 2 while the GPU is still presenting frame 1
❌ Write new uniform buffer data while GPU is still reading the old data
❌ Sample a texture that is still being written by a compute shader
❌ Free a buffer that the GPU is still accessing
Three Synchronization Primitives
Primitive
CPU or GPU?
Purpose
VkFence
GPU → CPU
CPU blocks until GPU finishes a submission
VkSemaphore
GPU → GPU
One GPU queue waits for another GPU queue
Pipeline Barrier (vkCmdPipelineBarrier)
GPU internal
Memory and execution ordering within command buffer
Fences, Semaphores in the Main Loop
// Per-frame synchronization objectsconst int MAX_FRAMES_IN_FLIGHT = 2; // CPU can be 1 frame ahead of GPU maxstd::vector<VkSemaphore> imageAvailableSemaphores(MAX_FRAMES_IN_FLIGHT);std::vector<VkSemaphore> renderFinishedSemaphores(MAX_FRAMES_IN_FLIGHT);std::vector<VkFence> inFlightFences(MAX_FRAMES_IN_FLIGHT);VkSemaphoreCreateInfo semaphoreInfo{ VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO };VkFenceCreateInfo fenceInfo { VK_STRUCTURE_TYPE_FENCE_CREATE_INFO };fenceInfo.flags = VK_FENCE_CREATE_SIGNALED_BIT; // Start signaled (so first frame doesn't hang)for (int i = 0; i < MAX_FRAMES_IN_FLIGHT; i++) { vkCreateSemaphore(device, &semaphoreInfo, nullptr, &imageAvailableSemaphores[i]); vkCreateSemaphore(device, &semaphoreInfo, nullptr, &renderFinishedSemaphores[i]); vkCreateFence(device, &fenceInfo, nullptr, &inFlightFences[i]);}
Pipeline Barriers and Image Layout Transitions
Every VkImage has a layout that determines how the GPU hardware accesses its memory.
Layout
What it means
VK_IMAGE_LAYOUT_UNDEFINED
Don’t care about contents (initial state)
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL
Being actively drawn to as a render target
VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL
Being sampled in a shader
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL
Source for a GPU copy operation
VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL
Destination for a GPU copy operation
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
Ready to be shown on the display
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
Depth buffer being used for Z-testing
// Transition an image from one layout to another using a pipeline barriervoid transitionImageLayout(VkImage image, VkImageLayout oldLayout, VkImageLayout newLayout) { VkCommandBuffer cmd = beginSingleTimeCommands(); // Helper: begin a one-off command buffer VkImageMemoryBarrier barrier{}; barrier.sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER; barrier.oldLayout = oldLayout; barrier.newLayout = newLayout; barrier.srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; // No queue transfer barrier.dstQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED; barrier.image = image; barrier.subresourceRange.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; barrier.subresourceRange.baseMipLevel = 0; barrier.subresourceRange.levelCount = 1; barrier.subresourceRange.baseArrayLayer = 0; barrier.subresourceRange.layerCount = 1; VkPipelineStageFlags srcStage, dstStage; if (oldLayout == VK_IMAGE_LAYOUT_UNDEFINED && newLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL) { barrier.srcAccessMask = 0; // Nothing to wait for barrier.dstAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; // Transfer must wait until here srcStage = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT; dstStage = VK_PIPELINE_STAGE_TRANSFER_BIT; } else if (oldLayout == VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL && newLayout == VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL) { barrier.srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT; // The transfer write must finish barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT; // Before the shader can read it srcStage = VK_PIPELINE_STAGE_TRANSFER_BIT; dstStage = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT; } vkCmdPipelineBarrier(cmd, srcStage, dstStage, 0, 0, nullptr, 0, nullptr, 1, &barrier); endSingleTimeCommands(cmd);}
18 — The Main Render Loop
The Complete Frame Loop
uint32_t currentFrame = 0;void drawFrame() { // === STEP 1: Wait for the GPU to finish the PREVIOUS frame N-MAX === vkWaitForFences(device, 1, &inFlightFences[currentFrame], VK_TRUE, UINT64_MAX); // === STEP 2: Acquire the next available swapchain image === uint32_t imageIndex; VkResult result = vkAcquireNextImageKHR( device, swapChain, UINT64_MAX, imageAvailableSemaphores[currentFrame], // Signal this when image is available VK_NULL_HANDLE, &imageIndex ); // Handle window resize if (result == VK_ERROR_OUT_OF_DATE_KHR) { recreateSwapChain(); return; } // Reset fence only once we know we will submit work vkResetFences(device, 1, &inFlightFences[currentFrame]); // === STEP 3: Update uniform buffer data for this frame === updateUniformBuffer(currentFrame); // === STEP 4: Record all draw calls === vkResetCommandBuffer(commandBuffers[currentFrame], 0); recordCommandBuffer(commandBuffers[currentFrame], imageIndex); // === STEP 5: Submit to the GPU queue === VkSemaphore waitSemaphores[] = { imageAvailableSemaphores[currentFrame] }; VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT }; VkSemaphore signalSemaphores[] = { renderFinishedSemaphores[currentFrame] }; VkSubmitInfo submitInfo{}; submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO; submitInfo.waitSemaphoreCount = 1; submitInfo.pWaitSemaphores = waitSemaphores; // Wait: image must be available submitInfo.pWaitDstStageMask = waitStages; submitInfo.commandBufferCount = 1; submitInfo.pCommandBuffers = &commandBuffers[currentFrame]; submitInfo.signalSemaphoreCount = 1; submitInfo.pSignalSemaphores = signalSemaphores; // Signal: rendering is done vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFences[currentFrame]); // === STEP 6: Present the rendered frame to the screen === VkPresentInfoKHR presentInfo{}; presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR; presentInfo.waitSemaphoreCount = 1; presentInfo.pWaitSemaphores = signalSemaphores; // Wait: render must be done presentInfo.swapchainCount = 1; presentInfo.pSwapchains = &swapChain; presentInfo.pImageIndices = &imageIndex; vkQueuePresentKHR(presentQueue, &presentInfo); currentFrame = (currentFrame + 1) % MAX_FRAMES_IN_FLIGHT;}
19 — Push Constants
What Are Push Constants?
Push constants let you inject a small block of data (up to 128 bytes; usually two mat4) directly into the GPU command stream — no buffer, no descriptor, zero overhead.
Perfect for: model matrix per-object, material ID, time value, a few flags.
// In the pipeline layout, declare push constant rangeVkPushConstantRange pushConstantRange{};pushConstantRange.stageFlags = VK_SHADER_STAGE_VERTEX_BIT;pushConstantRange.offset = 0;pushConstantRange.size = sizeof(glm::mat4); // 64 bytespipelineLayoutInfo.pushConstantRangeCount = 1;pipelineLayoutInfo.pPushConstantRanges = &pushConstantRange;// Per-draw: push the model matrix for this specific objectglm::mat4 modelMatrix = transform.getMatrix();vkCmdPushConstants(commandBuffer, pipelineLayout, VK_SHADER_STAGE_VERTEX_BIT, 0, sizeof(glm::mat4), &modelMatrix);vkCmdDrawIndexed(commandBuffer, indexCount, 1, 0, 0, 0);
A Compute Shader runs on the GPU’s shader cores but has no connection to the rendering pipeline. There are no vertices, no triangles, no pixels. Just raw parallel computation organized into a grid of threads.
graph TD
Dispatch["vkCmdDispatch(groupX, groupY, groupZ)\nLaunches a 3D grid of workgroups"]
WG["Workgroup (e.g., 16x16=256 threads)\nAll in this workgroup share fast on-chip memory"]
T["Individual Threads\ngl_GlobalInvocationID gives each thread its unique ID"]
SB["Storage Buffer (VkBuffer)\nRead AND Write from shader — huge arrays of data"]
Dispatch --> WG --> T --> SB
Writing a Compute Shader (GLSL)
// particle_update.comp#version 450// 256 threads per workgroup (16x16 = 256 for 2D, or 256x1 for 1D particle array)layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;struct Particle { vec2 position; vec2 velocity; vec4 color;};// Input particles (read-only)layout(std140, set = 0, binding = 0) readonly buffer ParticleSSBOIn { Particle particlesIn[];};// Output particles (write result here)layout(std140, set = 0, binding = 1) buffer ParticleSSBOOut { Particle particlesOut[];};// Time delta from CPUlayout(push_constant) uniform PushConstants { float deltaTime; } pc;void main() { uint index = gl_GlobalInvocationID.x; // Which particle is this thread handling? Particle p = particlesIn[index]; // Update position by velocity p.position += p.velocity * pc.deltaTime; // Bounce off edges if (abs(p.position.x) >= 1.0) p.velocity.x = -p.velocity.x; if (abs(p.position.y) >= 1.0) p.velocity.y = -p.velocity.y; particlesOut[index] = p;}
Dispatching Compute from C++
// Bind compute pipelinevkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipeline);// Bind the storage buffers as descriptor setsvkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, computePipelineLayout, 0, 1, &computeDescriptorSet, 0, nullptr);// Push the time deltafloat deltaTime = 0.016f;vkCmdPushConstants(commandBuffer, computePipelineLayout, VK_SHADER_STAGE_COMPUTE_BIT, 0, sizeof(float), &deltaTime);// Dispatch! // We have PARTICLE_COUNT particles, each thread handles 1.// With local_size_x=256, we need (PARTICLE_COUNT / 256) workgroups.vkCmdDispatch(commandBuffer, PARTICLE_COUNT / 256, 1, 1);// ⚠️ IMPORTANT: Add a barrier before reading the result in render!VkBufferMemoryBarrier computeToRenderBarrier{};computeToRenderBarrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;computeToRenderBarrier.srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;computeToRenderBarrier.dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;computeToRenderBarrier.buffer = particleSSBO;computeToRenderBarrier.size = VK_WHOLE_SIZE;vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT, VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, 0, 0, nullptr, 1, &computeToRenderBarrier, 0, nullptr);
21 — Dynamic Rendering (Vulkan 1.3)
Why Dynamic Rendering?
Creating VkRenderPass objects and VkFramebuffer objects is verbose and rigid. In Vulkan 1.3, Dynamic Rendering was promoted to core, allowing you to begin rendering directly from a command buffer — no pre-built render pass objects needed.
// Enable during device creationVkPhysicalDeviceDynamicRenderingFeatures dynamicRenderingFeature{};dynamicRenderingFeature.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DYNAMIC_RENDERING_FEATURES;dynamicRenderingFeature.dynamicRendering = VK_TRUE;// Attach to device create info chaindeviceCreateInfo.pNext = &dynamicRenderingFeature;// ---- Per-frame: Begin rendering without a render pass! ----// First: barrier the swapchain image to COLOR_ATTACHMENT_OPTIMALVkRenderingAttachmentInfo colorAttachment{};colorAttachment.sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO;colorAttachment.imageView = swapChainImageViews[imageIndex];colorAttachment.imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL;colorAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR;colorAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE;colorAttachment.clearValue = { {0.0f, 0.0f, 0.0f, 1.0f} };VkRenderingInfo renderingInfo{};renderingInfo.sType = VK_STRUCTURE_TYPE_RENDERING_INFO;renderingInfo.renderArea.offset = {0, 0};renderingInfo.renderArea.extent = swapChainExtent;renderingInfo.layerCount = 1;renderingInfo.colorAttachmentCount = 1;renderingInfo.pColorAttachments = &colorAttachment;renderingInfo.pDepthAttachment = &depthAttachment;vkCmdBeginRendering(commandBuffer, &renderingInfo);// ... draw calls ...vkCmdEndRendering(commandBuffer);
22 — Bindless Rendering (Advanced)
The Problem With Normal Descriptors
In the standard workflow, every time you draw a mesh with a different texture, you must:
vkCmdBindDescriptorSets(...) — This is a CPU call. Done thousands of times per frame, it becomes a bottleneck.
Bindless eliminates this by uploading ALL textures into one gigantic descriptor array. The shader picks which texture to use via a Push Constant material_index.
Tag your resources (buffers, queues) with human-readable names visible in RenderDoc
// Name your resources for debugging in RenderDocVkDebugUtilsObjectNameInfoEXT nameInfo{};nameInfo.sType = VK_STRUCTURE_TYPE_DEBUG_UTILS_OBJECT_NAME_INFO_EXT;nameInfo.objectType = VK_OBJECT_TYPE_IMAGE;nameInfo.objectHandle = (uint64_t)gbufferAlbedo;nameInfo.pObjectName = "GBuffer_Albedo_Texture"; // Appears in RenderDoc!vkSetDebugUtilsObjectNameEXT(device, &nameInfo);
25 — Full Object Reference Cheatsheet
Every Vulkan Object and What It Does
Vulkan Object
Category
What It Is
VkInstance
Bootstrap
Connection between app and Vulkan library
VkPhysicalDevice
Hardware
A GPU in the machine — enumerate and pick
VkDevice
Logic
App’s logical connection to a specific GPU
VkQueue
Execution
Submit command buffers here. Different families for graphics/compute/transfer.
VkSurfaceKHR
Platform
Bridge between Vulkan and the OS window system
VkSwapchainKHR
Presentation
Ring of images rendered to and shown on monitor
VkImage
Memory
Raw block of VRAM containing pixel data
VkImageView
Memory
Describes how to interpret a VkImage (2D, cube map, mip range)
VkSampler
Textures
Defines filtering and wrapping when shader reads a texture
VkBuffer
Memory
Raw block of VRAM for vertex, index, uniform, storage data
VkDeviceMemory
Memory
A raw allocation of GPU memory. Bound to VkBuffer or VkImage.
VkShaderModule
Pipeline
Compiled SPIR-V bytecode of one shader stage
VkRenderPass
Pipeline
Blueprint: what attachments to expect, how to load/store them