Harness Patterns

A “harness” is the code around your @llm_chat agent that manages state, context planning, and orchestration. The framework gives you the ReAct loop; you build the harness that makes it production-ready.

Core Philosophy

An agent is not a person. It is a method for constructing the right context for each reasoning step.

The harness engineer’s job:

Ensure the model sees the shortest complete context at every step
Persist state so sessions can be resumed deterministically
Encode lessons into the environment, not into operator memory

Pattern 1: TUI General Agent

The canonical pattern from examples/tui_general_agent_example.py:

repl = PyRepl(working_directory=workspace)
file_tools = FileToolset(workspace).toolset

@llm_chat(
    llm_interface=llm,
    toolkit=[*repl.toolset, *file_tools],
    stream=True,
    self_reference_key=MEMORY_KEY,
    temperature=1.0,
)
async def core_agent(message: str, history: HistoryList):
    """
    {system_prompt_here}
    {environment_block}
    """
    pass


@tui(custom_event_hook=[debug_hook])
async def agent(message: str, history=None, _abort_signal=None):
    prepared_message = inject_compaction_if_needed(message)
    prepared_history = prepare_history(history)
    template_params = build_template_params()

    async for output in core_agent(
        message=prepared_message,
        history=prepared_history,
        _template_params=template_params,
        _abort_signal=_abort_signal,
    ):
        yield output

The harness layer (agent) wraps the core agent to:

Inject compaction instructions when context is large
Build dynamic template parameters (environment detection, workspace info)
Prepare history format
Route abort signals
Connect to the TUI

Pattern 2: Context Window Management

COMPACTION_THRESHOLD = 0.2  # Compact when 20% through context window

def _should_request_compaction() -> bool:
    total_tokens = llm.input_token_count + llm.output_token_count
    context_window = llm.context_window
    return total_tokens > context_window * COMPACTION_THRESHOLD


def prepare_user_message(message: str) -> str:
    if not _should_request_compaction():
        return message

    compaction_instruction = (
        "After finishing this task, call runtime.selfref.context.compact(...) "
        "to checkpoint your context."
    )
    return f"{message}\n\n{compaction_instruction}"

This pattern keeps context fresh without manual intervention.

Pattern 3: Dynamic Environment Block

Inject runtime context via template parameters:

def build_environment_block(workspace: Path) -> str:
    git_branch = run_command(["git", "branch", "--show-current"], workspace)
    git_status = run_command(["git", "status", "--porcelain"], workspace)

    return f"""
# Environment
- Workspace: {workspace}
- Git branch: {git_branch}
- Modified files: {len(git_status.splitlines())}
- Platform: {sys.platform}
- Python: {sys.version_info.major}.{sys.version_info.minor}
"""


TEMPLATE_PARAMS = {"environment_block": build_environment_block(workspace)}

Pattern 4: Supervisor Agent

An outer agent that delegates to specialized inner agents:

@llm_chat(llm_interface=llm, toolkit=[delegate_to_coder, delegate_to_reviewer])
async def supervisor(task: str, history: list | None = None):
    """
    Route tasks to the appropriate specialist.
    Use delegate_to_coder for implementation work.
    Use delegate_to_reviewer for code review.
    Synthesize results before responding.
    """
    pass


@tool
async def delegate_to_coder(task: str, context: str) -> str:
    """Delegate a coding task to the implementation specialist."""
    result = []
    async for output in coder_agent(f"{task}\n\nContext: {context}", []):
        if is_response_yield(output):
            result.append(output.response)
    return "\n".join(result)

The supervisor pattern enables:

Different models for different tasks (cheap model routes, expensive model implements)
Isolated context per specialist (each starts fresh)
Clear delegation boundaries

Pattern 5: Session Persistence

Persist history externally for resume:

import json
from pathlib import Path

SESSION_DIR = Path("~/.myagent/sessions").expanduser()


def save_session(session_id: str, history: list, metadata: dict):
    SESSION_DIR.mkdir(parents=True, exist_ok=True)
    path = SESSION_DIR / f"{session_id}.json"
    path.write_text(json.dumps({
        "history": history,
        "metadata": metadata,
    }, ensure_ascii=False))


def load_session(session_id: str) -> tuple[list, dict] | None:
    path = SESSION_DIR / f"{session_id}.json"
    if not path.exists():
        return None
    data = json.loads(path.read_text())
    return data["history"], data["metadata"]

Pattern 6: Custom Tool Runtime Override

Override which tools the agent sees based on context:

from SimpleLLMFunc.runtime.selfref import SELF_REFERENCE_TOOLKIT_OVERRIDE_TEMPLATE_PARAM


def build_runtime_toolkit(workspace: Path) -> list:
    """Build toolkit dynamically based on workspace state."""
    tools = [*repl.toolset]

    if (workspace / "package.json").exists():
        tools.extend(node_tools)
    elif (workspace / "pyproject.toml").exists():
        tools.extend(python_tools)

    tools.extend(FileToolset(workspace).toolset)
    return tools


template_params = {
    SELF_REFERENCE_TOOLKIT_OVERRIDE_TEMPLATE_PARAM: build_runtime_toolkit(workspace),
}

The Key Insight

The harness is where context engineering happens:

What the model sees = what you put in the template params + history + tool results
When the model forgets = when you fail to compact or persist
Why the model fails = usually missing context, not missing capability

Build harnesses that make the right context inevitable, not optional.

​Harness Patterns

​Core Philosophy

​Pattern 1: TUI General Agent

​Pattern 2: Context Window Management

​Pattern 3: Dynamic Environment Block

​Pattern 4: Supervisor Agent

​Pattern 5: Session Persistence

​Pattern 6: Custom Tool Runtime Override

​The Key Insight