Dependency Resolution & Scheduling

Pass ordering falls out of the resource dependencies. This chapter is about how that calculation works, where it can fail, and how runtime toggling fits in.

Dependency Edge Construction

compile() builds edges by walking the passes and the resources they touch.

Iterate every pass.
For each resource a pass reads, find the pass that last wrote it.
Add a directed edge from the writer to the reader.

Pass A writes texture T
Pass B reads texture T
  => Edge: A -> B (B depends on A)

For reads_writes resources the pass counts as both reader and writer, so it sits between the previous writer and the next reader. Optional reads create edges only when a writer exists. If no pass writes the slot, the optional reader simply does not get the edge and the pass is free to schedule wherever else its other dependencies allow.

Topological Sort

The graph performs a topological sort using petgraph. A topological sort of a DAG produces a linear ordering where every edge A -> B puts A before B. That ordering guarantees every pass runs after the passes that produce its inputs.

The cost is O(V + E) where V is the number of passes and E is the number of dependency edges. Kahn's algorithm (iteratively remove nodes with no incoming edges) and depth-first search both produce the same result. petgraph picks one implementation.

A cycle in the graph means compilation fails with RenderGraphError::CyclicDependency. Two passes that each depend on the other's output have no valid ordering. The fix is in the graph definition: one of the passes is over-declared and should not actually depend on the other's resource.

Dead Pass Culling

Not every pass contributes to the final image. The graph uses backward reachability from external resources to keep only the passes that matter.

Start with every external resource marked "required."
Walk the execution order backward.
A pass is required if it writes a required resource, or if it has no writes and no reads_writes (which the graph treats as a side-effecting pass).
If a pass is required, every resource it reads is also required.
Any pass not marked required is culled.

Pass A writes T1
Pass B reads T1, writes T2     <- T2 is not read by anyone
Pass C reads T1, writes output  <- output is external

Result: A and C execute, B is culled

The culling is what lets update_render_graph() toggle effects on and off without leaving dead passes in the pipeline. A disabled pass whose output nothing else reads is invisible to the rest of the graph.

Runtime Pass Toggling

A pass can be turned on or off without recompiling.

#![allow(unused)]
fn main() {
graph.set_pass_enabled("bloom_pass", false)?;
}

Disabling does not remove the pass from the graph. execute() is still called, but is_pass_enabled() returns false. The pass's execute() checks the flag and skips its GPU work.

#![allow(unused)]
fn main() {
fn execute<'r, 'e>(
    &mut self,
    ctx: PassExecutionContext<'r, 'e, World>,
) -> Result<Vec<SubGraphRunCommand<'r>>> {
    if !ctx.is_pass_enabled() {
        return Ok(vec![]);
    }
    // ... normal execution
}
}

The trade-off is that the pass's outputs still exist as dependencies in the graph. Disabling a pass that downstream passes depend on (and that does not have an optional_reads declaration covering it) leaves the downstream passes reading whatever was last in the attachment, which is usually the clear value.

Recompilation

The graph tracks a needs_recompile flag. Adding or removing a pass sets it. On the next execute(), the graph drops every existing edge, rebuilds the dependency edges, re-sorts topologically, and recomputes store ops, clear ops, lifetimes, and aliasing. The call site does not have to call compile() again.

Store and Clear Operations

Store Operations

On tile-based GPU architectures (mobile, Apple Silicon), render pass attachments live in fast on-chip tile memory during the pass. At the end of the pass, the driver decides whether to write that tile memory back out to main VRAM. The write-back is the store operation, and it costs real bandwidth.

The graph picks the store op per resource write.

Store is used when any later pass reads the resource, or when the resource is external and has force_store set. The data has to survive.
Discard is used when no later pass reads the resource. The GPU skips the write-back entirely. On tile-based architectures this is a significant win.

Clear Operations

The other end of the pass has a parallel decision. The GPU decides what to do with the existing attachment contents.

Clear writes a known value (black, zero depth) into the attachment. This is cheap because tile memory can be initialized without reading from VRAM.
Load reads the existing contents from VRAM into tile memory. This is required when a previous pass wrote data that this pass needs to preserve.

The first pass that writes a resource with a clear value (clear_color or clear_depth) gets LoadOp::Clear. Every subsequent writer gets LoadOp::Load.

The cost of getting these wrong is real. Picking Clear when Load was correct erases the previous pass's work. Picking Load when Clear was correct wastes bandwidth loading garbage data. The graph computes both automatically, and get_color_attachment() and get_depth_attachment() return the chosen ops.

Keyboard shortcuts

Nightshade Game Engine