Compile Pipeline

The compile pipeline transforms invocation configuration, a base transcript, and pending runtime patches into the final messages sent to the LLM provider. It’s a two-stage process with a clean boundary between transcript patching and request rendering.

The Two Stages

compile_invocation_turn(spec, transcript, pending_mutations, selfref_snapshot)
│
├─► Stage 1: reduce_turn_context(transcript, mutations, selfref_snapshot)
│       • Apply all pending runtime patches to the transcript
│       • Refresh selfref snapshot if markers detected
│       • Clone the result (no shared references)
│       → Returns: ReducedTurnContext
│
└─► Stage 2: convert_to_llm_request(reduced, prompt_contract)
        • Resolve the system prompt (selfref > explicit > transcript > docstring)
        • Place/replace system message in transcript
        • Render final messages (inject tool specs, must_principles)
        → Returns: CompiledTurnContext

Stage 1: Reduce Turn Context

reduce_turn_context takes the current base transcript plus all pending runtime patches and produces a clean, patch-applied transcript.

def reduce_turn_context(
    transcript: NormalizedMessageList,
    pending_mutations: List[ContextMutation],
    selfref_snapshot: Optional[DataFromSelfRef] = None,
) -> ReducedTurnContext:

What happens:

Apply runtime patches — apply_mutations(transcript, pending_mutations) processes each mutation in order:
- AssistantMessageMutation → appends assistant message
- ToolResultMutation → appends tool result
- ContextReplaceMutation → replaces entire list
- ContextSummaryMutation → replaces with summary, stores experiences
- ExperienceRemember/Forget → accumulated, committed before next non-experience mutation
- etc.
Refresh selfref snapshot — If the transcript (after mutations) contains selfref markers (experiences, summaries), re-parse DataFromSelfRef from the transcript content. This ensures the snapshot reflects any compaction or experience mutations that just applied.
Clone — The result is a deep clone. No mutation of shared state.

Output:

@dataclass
class ReducedTurnContext:
    transcript: NormalizedMessageList      # Mutation-applied, cloned
    selfref_snapshot: Optional[DataFromSelfRef]  # Refreshed if needed

Stage 2: Convert to LLM Request

convert_to_llm_request takes the reduced transcript plus the invocation’s prompt contract and produces the final messages for the provider.

def convert_to_llm_request(
    reduced: ReducedTurnContext,
    prompt_contract: PromptContract,
) -> CompiledTurnContext:

What happens:

Resolve system prompt — Priority order:
- If selfref_snapshot exists → render base prompt + experiences block
- Else if prompt_contract.system_prompt is set → use it directly
- Else if transcript has a system message → extract its content
- Else if prompt_contract.base_instruction exists → use docstring fallback
Place system message — Either replace the existing system message in the transcript, or prepend a new one. If no system prompt was resolved, remove any existing system message.
Render LLM messages — render_llm_input_messages() finalizes the messages:
- Prepends <tool_best_practices> block if tools are mounted
- Appends <must_principles> block if required (tells model to use native tool calls)
- Returns the final message list ready for the provider

Output:

@dataclass
class CompiledTurnContext:
    transcript: NormalizedMessageList       # The transcript after system prompt resolution
    system_prompt: Optional[str]           # The resolved system prompt text
    llm_messages: NormalizedMessageList    # Final messages for the provider
    selfref_snapshot: Optional[DataFromSelfRef]  # Carried forward

Where This Runs in the ReAct Loop

# Simplified ReAct loop structure
while has_more_work:
    # 1. Collect mutations from hooks
    hook_mutations = collect_hook_mutations(state)
    
    # 2. Compile context (Stage 1 only — apply mutations)
    compiled_context = compile_context(state, hook_mutations + pending)
    
    # 3. Compile for LLM (Stage 1 + Stage 2 — full pipeline)
    turn = compile_invocation_turn(spec, compiled_context.messages, [], selfref)
    
    # 4. Send to LLM
    llm_result = execute_single_llm_phase(turn.llm_messages, ...)
    
    # 5. Execute tools if needed
    tool_result = schedule_tool_batch(llm_result.tool_calls, ...)
    
    # 6. Collect all new mutations for next iteration
    pending = llm_result.mutations + tool_result.mutations

Every iteration goes through the full pipeline. Runtime side effects do not “just append to the live list”; they produce patches that are applied at the boundary. This guarantees that even after 50 tool calls, the transcript state is consistent and auditable.

The Single Entry Point

All compilation flows through one function:

def compile_invocation_turn(
    spec: InvocationSpec,
    transcript: NormalizedMessageList,
    pending_mutations: Optional[List[ContextMutation]] = None,
    selfref_snapshot: Optional[DataFromSelfRef] = None,
) -> CompiledTurnContext:

Both @llm_function and @llm_chat use this same entry point. There is no separate compilation path for different decorator modes.

Why Two Stages?

The split enables:

Stage 1 alone for internal state management (e.g., compile_context in the ReAct loop updates state without rendering for the LLM)
Stage 2 adds provider-specific rendering (tool injection, system prompt placement) only when actually calling the LLM
Testing — you can test mutation application separately from prompt rendering
SelfRef refresh — happens between stages, ensuring the snapshot is current before prompt resolution

Practical Implications

For most users, this is invisible. You write a function, pass history, mount tools, and consume events. But when you need to debug internals:

Debug transcript issues → Check what runtime patches were produced and in what order
Understand system prompt behavior → Know the priority order in Stage 2
Build framework extensions → Use the compile boundary instead of mutating live messages directly

The important distinction: mutations are internal transcript patches. They are not the source of docstrings, template parameters, tool schemas, or the initial history.

Next: SelfRef

How durable context (experiences, compaction, forking) works through the SelfReference system.

​Compile Pipeline

​The Two Stages

​Stage 1: Reduce Turn Context

​Stage 2: Convert to LLM Request

​Where This Runs in the ReAct Loop

​The Single Entry Point

​Why Two Stages?

​Practical Implications