> ## Documentation Index
> Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt
> Use this file to discover all available pages before exploring further.

# Compile Pipeline

> How invocation config, transcript, and runtime patches become the final LLM request

# Compile Pipeline

The compile pipeline transforms invocation configuration, a base transcript, and pending runtime patches into the final messages sent to the LLM provider. It's a two-stage process with a clean boundary between transcript patching and request rendering.

## The Two Stages

```
compile_invocation_turn(spec, transcript, pending_mutations, selfref_snapshot)
│
├─► Stage 1: reduce_turn_context(transcript, mutations, selfref_snapshot)
│       • Apply all pending runtime patches to the transcript
│       • Refresh selfref snapshot if markers detected
│       • Clone the result (no shared references)
│       → Returns: ReducedTurnContext
│
└─► Stage 2: convert_to_llm_request(reduced, prompt_contract)
        • Resolve the system prompt (selfref > explicit > transcript > docstring)
        • Place/replace system message in transcript
        • Render final messages (inject tool specs, must_principles)
        → Returns: CompiledTurnContext
```

## Stage 1: Reduce Turn Context

`reduce_turn_context` takes the current base transcript plus all pending runtime patches and produces a clean, patch-applied transcript.

```python theme={null}
def reduce_turn_context(
    transcript: NormalizedMessageList,
    pending_mutations: List[ContextMutation],
    selfref_snapshot: Optional[DataFromSelfRef] = None,
) -> ReducedTurnContext:
```

What happens:

1. **Apply runtime patches** — `apply_mutations(transcript, pending_mutations)` processes each mutation in order:
   * `AssistantMessageMutation` → appends assistant message
   * `ToolResultMutation` → appends tool result
   * `ContextReplaceMutation` → replaces entire list
   * `ContextSummaryMutation` → replaces with summary, stores experiences
   * `ExperienceRemember/Forget` → accumulated, committed before next non-experience mutation
   * etc.

2. **Refresh selfref snapshot** — If the transcript (after mutations) contains selfref markers (experiences, summaries), re-parse `DataFromSelfRef` from the transcript content. This ensures the snapshot reflects any compaction or experience mutations that just applied.

3. **Clone** — The result is a deep clone. No mutation of shared state.

Output:

```python theme={null}
@dataclass
class ReducedTurnContext:
    transcript: NormalizedMessageList      # Mutation-applied, cloned
    selfref_snapshot: Optional[DataFromSelfRef]  # Refreshed if needed
```

## Stage 2: Convert to LLM Request

`convert_to_llm_request` takes the reduced transcript plus the invocation's prompt contract and produces the final messages for the provider.

```python theme={null}
def convert_to_llm_request(
    reduced: ReducedTurnContext,
    prompt_contract: PromptContract,
) -> CompiledTurnContext:
```

What happens:

1. **Resolve system prompt** — Priority order:
   * If `selfref_snapshot` exists → render base prompt + experiences block
   * Else if `prompt_contract.system_prompt` is set → use it directly
   * Else if transcript has a system message → extract its content
   * Else if `prompt_contract.base_instruction` exists → use docstring fallback

2. **Place system message** — Either replace the existing system message in the transcript, or prepend a new one. If no system prompt was resolved, remove any existing system message.

3. **Render LLM messages** — `render_llm_input_messages()` finalizes the messages:
   * Prepends `<tool_best_practices>` block if tools are mounted
   * Appends `<must_principles>` block if required (tells model to use native tool calls)
   * Returns the final message list ready for the provider

Output:

```python theme={null}
@dataclass
class CompiledTurnContext:
    transcript: NormalizedMessageList       # The transcript after system prompt resolution
    system_prompt: Optional[str]           # The resolved system prompt text
    llm_messages: NormalizedMessageList    # Final messages for the provider
    selfref_snapshot: Optional[DataFromSelfRef]  # Carried forward
```

## Where This Runs in the ReAct Loop

```python theme={null}
# Simplified ReAct loop structure
while has_more_work:
    # 1. Collect mutations from hooks
    hook_mutations = collect_hook_mutations(state)
    
    # 2. Compile context (Stage 1 only — apply mutations)
    compiled_context = compile_context(state, hook_mutations + pending)
    
    # 3. Compile for LLM (Stage 1 + Stage 2 — full pipeline)
    turn = compile_invocation_turn(spec, compiled_context.messages, [], selfref)
    
    # 4. Send to LLM
    llm_result = execute_single_llm_phase(turn.llm_messages, ...)
    
    # 5. Execute tools if needed
    tool_result = schedule_tool_batch(llm_result.tool_calls, ...)
    
    # 6. Collect all new mutations for next iteration
    pending = llm_result.mutations + tool_result.mutations
```

Every iteration goes through the full pipeline. Runtime side effects do not "just append to the live list"; they produce patches that are applied at the boundary. This guarantees that even after 50 tool calls, the transcript state is consistent and auditable.

## The Single Entry Point

All compilation flows through one function:

```python theme={null}
def compile_invocation_turn(
    spec: InvocationSpec,
    transcript: NormalizedMessageList,
    pending_mutations: Optional[List[ContextMutation]] = None,
    selfref_snapshot: Optional[DataFromSelfRef] = None,
) -> CompiledTurnContext:
```

Both `@llm_function` and `@llm_chat` use this same entry point. There is no separate compilation path for different decorator modes.

## Why Two Stages?

The split enables:

* **Stage 1 alone** for internal state management (e.g., `compile_context` in the ReAct loop updates state without rendering for the LLM)
* **Stage 2** adds provider-specific rendering (tool injection, system prompt placement) only when actually calling the LLM
* **Testing** — you can test mutation application separately from prompt rendering
* **SelfRef refresh** — happens between stages, ensuring the snapshot is current before prompt resolution

## Practical Implications

For most users, this is invisible. You write a function, pass history, mount tools, and consume events.

But when you need to debug internals:

* **Debug transcript issues** → Check what runtime patches were produced and in what order
* **Understand system prompt behavior** → Know the priority order in Stage 2
* **Build framework extensions** → Use the compile boundary instead of mutating live messages directly

The important distinction: mutations are internal transcript patches. They are not the source of docstrings, template parameters, tool schemas, or the initial history.

<Card title="Next: SelfRef" icon="brain" href="/context/selfref">
  How durable context (experiences, compaction, forking) works through the SelfReference system.
</Card>