Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Render Graph

Live Demos: Custom Pass | Custom Multipass | Render Layers

The render graph is the system that schedules GPU work. Passes declare what they read and what they write, and the graph figures out the ordering, the memory layout, and the load and store operations. The motivation is that hand-ordering a real renderer does not scale.

The Problem: Manual Pass Ordering

A real renderer has dozens of passes. Shadow maps, geometry, SSAO, SSR, bloom, tonemapping, UI. Each one reads from and writes to intermediate textures. Doing this without automation means four kinds of pain.

The first is ordering. Shadow maps before geometry, geometry before SSAO, SSAO before compositing. Adding one pass means figuring out where it fits in the chain, and reordering one pass means re-checking three others.

The second is texture lifecycle. Allocate intermediate textures, track which ones are alive when, decide when to clear versus load and when to store versus discard. Getting this wrong shows up as black screens or stale data from a previous frame.

The third is memory. SSAO's intermediate texture and SSR's intermediate texture might never be alive at the same time. Without aliasing, both live in VRAM whether they need to or not.

The fourth is dynamic passes. Disabling bloom should not require rewriting the compositing pass's inputs. With hardcoded ordering, every conditional pass becomes an if statement threaded through the entire pipeline.

The render graph (also called a frame graph, after the Frostbite GDC 2017 talk "FrameGraph: Extensible Rendering Architecture in Frostbite") replaces all four with declared dependencies. Passes describe what they need. The graph handles ordering, memory, and lifecycle.

How It Works

The frame is modeled as a directed acyclic graph. Nodes are passes. Edges are resource dependencies. An edge from pass A to pass B means A produces data that B consumes.

This is the same abstraction as a build system (Make, Bazel) or a task scheduler. Given the edges, a topological sort produces a valid execution order. Once the order exists, the graph can analyze resource lifetimes, alias memory, compute load and store operations, and cull passes that do not contribute to any external output.

The key property is that dependencies are declarative, not imperative. A pass says "I read scene_color, I write bloom," not "run me after MeshPass and before PostProcessPass." Adding a new pass means declaring its slots, not editing every other pass that touches the same resources.

What This Buys

Five things fall out of the graph automatically.

Pass ordering is topologically sorted from read and write dependencies. Transient textures with non-overlapping lifetimes share GPU memory. The graph picks LoadOp::Clear, LoadOp::Load, StoreOp::Store, and StoreOp::Discard per attachment based on who reads what. Passes that do not contribute to any external output are culled. Passes can be enabled and disabled at runtime without recompiling the graph.

The RenderGraph Struct

#![allow(unused)]
fn main() {
pub struct RenderGraph<C = ()> {
    graph: DiGraph<GraphNode<C>, ResourceId>,  // petgraph directed graph
    pass_nodes: HashMap<String, NodeIndex>,     // pass name -> graph node
    resources: RenderGraphResources,            // texture/buffer descriptors and handles
    execution_order: Vec<NodeIndex>,            // topologically sorted pass order
    store_ops: HashMap<ResourceId, StoreOp>,    // per-resource store operations
    clear_ops: HashSet<(NodeIndex, ResourceId)>,// which passes clear which resources
    aliasing_info: Option<ResourceAliasingInfo>,// memory sharing between transients
    culled_passes: HashSet<NodeIndex>,          // passes removed by dead-pass culling
    // ...
}
}

The generic parameter C is the "configs" type passed to passes during execution. Nightshade uses RenderGraph<World> so passes can read ECS state directly.

Lifecycle

1. Setup Phase (once at startup)

#![allow(unused)]
fn main() {
let mut graph = RenderGraph::new();

// Declare textures
let depth = graph.add_depth_texture("depth")
    .size(1920, 1080)
    .clear_depth(0.0)
    .transient();

let scene_color = graph.add_color_texture("scene_color")
    .format(wgpu::TextureFormat::Rgba16Float)
    .size(1920, 1080)
    .clear_color(wgpu::Color::BLACK)
    .transient();

let swapchain = graph.add_color_texture("swapchain")
    .format(surface_format)
    .external();

// Add passes with slot bindings
graph.add_pass(
    Box::new(clear_pass),
    &[("color", scene_color), ("depth", depth)],
)?;

graph.add_pass(
    Box::new(mesh_pass),
    &[("color", scene_color), ("depth", depth)],
)?;

graph.add_pass(
    Box::new(blit_pass),
    &[("input", scene_color), ("output", swapchain)],
)?;

// Compile: build edges, sort, compute aliasing
graph.compile()?;
}

2. Per-Frame Execution

#![allow(unused)]
fn main() {
// Provide the swapchain texture for this frame
graph.set_external_texture(swapchain_id, swapchain_view, width, height);

// Execute all passes, get command buffers
let command_buffers = graph.execute(&device, &queue, &world)?;

// Submit to GPU
queue.submit(command_buffers);
}

Key Methods

MethodDescription
new()Create an empty graph
add_color_texture()Declare a color render target (returns builder)
add_depth_texture()Declare a depth buffer (returns builder)
add_buffer()Declare a GPU buffer (returns builder)
add_pass()Add a pass with slot-to-resource bindings
pass()Fluent pass builder (alternative to add_pass)
compile()Build dependency graph, topological sort, compute aliasing
execute()Prepare and run all passes, return command buffers
set_external_texture()Provide an external texture (e.g. swapchain) each frame
set_pass_enabled()Enable/disable a pass at runtime
get_pass_mut()Access a pass for runtime configuration
resize_transient_resource()Change dimensions of a transient texture

Compilation Steps

compile() runs seven steps in sequence.

  1. Build dependency edges. For each resource, an edge is created from the writer to every reader.
  2. Topological sort. The passes are ordered so every pass executes after its dependencies.
  3. Compute store ops. Each resource write is marked Store or Discard based on whether any later pass reads it.
  4. Compute clear ops. The first pass that writes a resource with a clear value gets Clear. The rest get Load.
  5. Compute resource lifetimes. Each transient gets a first_use and last_use pass index.
  6. Compute resource aliasing. Transient resources with non-overlapping lifetimes are assigned to the same pool slot.
  7. Dead pass culling. Passes that do not contribute to any external output are marked for skipping.

Sub-Chapters