The Render Graph

Live Demos: Custom Pass | Custom Multipass | Render Layers

The render graph is a dependency-driven frame graph that automatically schedules GPU work. Instead of manually ordering render passes, you declare what each pass reads and writes, and the graph figures out the rest.

The Problem: Manual Pass Ordering

A modern renderer has dozens of passes: shadow maps, geometry, SSAO, SSR, bloom, tonemapping, UI. Each reads from and writes to intermediate textures. Without automation, you must:

Manually order passes - Shadow maps before geometry, geometry before SSAO, SSAO before compositing. Add one pass and you must figure out where it fits in the chain. Reorder one pass and you break three others.
Manually manage textures - Allocate intermediate textures, track which ones are alive when, decide when to clear vs load, when to store vs discard. Get it wrong and you see black screens or stale data from previous frames.
Manually optimize memory - SSAO's intermediate texture and SSR's intermediate texture might never be alive at the same time. Without aliasing, you waste VRAM on textures that could share the same memory.
Manually handle dynamic passes - Disabling bloom shouldn't require rewriting the compositing pass's inputs. But with hardcoded ordering, every conditional pass is an if statement that must be threaded through the entire pipeline.

A render graph (also called a frame graph, as described in the Frostbite GDC 2017 talk "FrameGraph: Extensible Rendering Architecture in Frostbite") solves all of this. You describe what each pass needs, and the graph handles ordering, memory, and lifecycle.

How It Works

The render graph models the frame as a directed acyclic graph (DAG) where:

Nodes are render passes
Edges are resource dependencies (an edge from A to B means "A produces data that B consumes")

This is the same abstraction as a build system (Make, Bazel) or a task scheduler. Given the dependency edges, a topological sort produces a valid execution order. The graph can then analyze resource lifetimes across that order to alias memory, compute load/store operations, and cull unused passes.

The key insight is that passes declare their dependencies declaratively through named slots, not imperatively through explicit ordering. This makes the system composable: adding a new pass means declaring what it reads and writes, not editing every other pass that touches the same resources.

Why a Render Graph?

Automatic ordering - Passes are topologically sorted based on read/write dependencies
Automatic memory management - Transient textures with non-overlapping lifetimes share GPU memory
Automatic load/store ops - The graph determines whether to Clear, Load, Store, or Discard each attachment
Dead pass culling - Passes that don't contribute to any external output are automatically skipped
Runtime toggling - Passes can be enabled/disabled at runtime without recompiling the graph

The RenderGraph Struct

#![allow(unused)]
fn main() {
pub struct RenderGraph<C = ()> {
    graph: DiGraph<GraphNode<C>, ResourceId>,  // petgraph directed graph
    pass_nodes: HashMap<String, NodeIndex>,     // pass name -> graph node
    resources: RenderGraphResources,            // texture/buffer descriptors and handles
    execution_order: Vec<NodeIndex>,            // topologically sorted pass order
    store_ops: HashMap<ResourceId, StoreOp>,    // per-resource store operations
    clear_ops: HashSet<(NodeIndex, ResourceId)>,// which passes clear which resources
    aliasing_info: Option<ResourceAliasingInfo>,// memory sharing between transients
    culled_passes: HashSet<NodeIndex>,          // passes removed by dead-pass culling
    // ...
}
}

The generic parameter C is the "configs" type passed to passes during execution. Nightshade uses RenderGraph<World> so passes can read ECS state.

Lifecycle

1. Setup Phase (once at startup)

#![allow(unused)]
fn main() {
let mut graph = RenderGraph::new();

// Declare textures
let depth = graph.add_depth_texture("depth")
    .size(1920, 1080)
    .clear_depth(0.0)
    .transient();

let scene_color = graph.add_color_texture("scene_color")
    .format(wgpu::TextureFormat::Rgba16Float)
    .size(1920, 1080)
    .clear_color(wgpu::Color::BLACK)
    .transient();

let swapchain = graph.add_color_texture("swapchain")
    .format(surface_format)
    .external();

// Add passes with slot bindings
graph.add_pass(
    Box::new(clear_pass),
    &[("color", scene_color), ("depth", depth)],
)?;

graph.add_pass(
    Box::new(mesh_pass),
    &[("color", scene_color), ("depth", depth)],
)?;

graph.add_pass(
    Box::new(blit_pass),
    &[("input", scene_color), ("output", swapchain)],
)?;

// Compile: build edges, sort, compute aliasing
graph.compile()?;
}

2. Per-Frame Execution

#![allow(unused)]
fn main() {
// Provide the swapchain texture for this frame
graph.set_external_texture(swapchain_id, swapchain_view, width, height);

// Execute all passes, get command buffers
let command_buffers = graph.execute(&device, &queue, &world)?;

// Submit to GPU
queue.submit(command_buffers);
}

Key Methods

Method	Description
`new()`	Create an empty graph
`add_color_texture()`	Declare a color render target (returns builder)
`add_depth_texture()`	Declare a depth buffer (returns builder)
`add_buffer()`	Declare a GPU buffer (returns builder)
`add_pass()`	Add a pass with slot-to-resource bindings
`pass()`	Fluent pass builder (alternative to `add_pass`)
`compile()`	Build dependency graph, topological sort, compute aliasing
`execute()`	Prepare and run all passes, return command buffers
`set_external_texture()`	Provide an external texture (e.g. swapchain) each frame
`set_pass_enabled()`	Enable/disable a pass at runtime
`get_pass_mut()`	Access a pass for runtime configuration
`resize_transient_resource()`	Change dimensions of a transient texture

Compilation Steps

When compile() is called:

Build dependency edges - For each resource, the graph creates an edge from writer to reader
Topological sort - Passes are sorted so every pass executes after its dependencies
Compute store ops - Determine Store vs Discard for each resource write
Compute clear ops - Determine which pass performs the initial Clear for each resource
Compute resource lifetimes - Track first_use and last_use for each transient resource
Compute resource aliasing - Transient resources with non-overlapping lifetimes share GPU memory
Dead pass culling - Passes that don't contribute to external outputs are marked for skipping

Sub-Chapters

Resources & Textures - Resource types, builders, external vs transient
Passes & the PassNode Trait - Implementing custom passes
Dependency Resolution & Scheduling - How passes are ordered
Resource Aliasing & Memory - GPU memory sharing
Custom Passes - Full examples of custom rendering

Keyboard shortcuts

Nightshade Game Engine