Documentation Index
Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt
Use this file to discover all available pages before exploring further.
Harness Patterns
A “harness” is the code around your @llm_chat agent that manages state, context planning, and orchestration. The framework gives you the ReAct loop; you build the harness that makes it production-ready.
Core Philosophy
An agent is not a person. It is a method for constructing the right context for each reasoning step.
The harness engineer’s job:
- Ensure the model sees the shortest complete context at every step
- Persist state so sessions can be resumed deterministically
- Encode lessons into the environment, not into operator memory
Pattern 1: TUI General Agent
The canonical pattern from examples/tui_general_agent_example.py:
repl = PyRepl(working_directory=workspace)
file_tools = FileToolset(workspace).toolset
@llm_chat(
llm_interface=llm,
toolkit=[*repl.toolset, *file_tools],
stream=True,
self_reference_key=MEMORY_KEY,
temperature=1.0,
)
async def core_agent(message: str, history: HistoryList):
"""
{system_prompt_here}
{environment_block}
"""
pass
@tui(custom_event_hook=[debug_hook])
async def agent(message: str, history=None, _abort_signal=None):
prepared_message = inject_compaction_if_needed(message)
prepared_history = prepare_history(history)
template_params = build_template_params()
async for output in core_agent(
message=prepared_message,
history=prepared_history,
_template_params=template_params,
_abort_signal=_abort_signal,
):
yield output
The harness layer (agent) wraps the core agent to:
- Inject compaction instructions when context is large
- Build dynamic template parameters (environment detection, workspace info)
- Prepare history format
- Route abort signals
- Connect to the TUI
Pattern 2: Context Window Management
COMPACTION_THRESHOLD = 0.2 # Compact when 20% through context window
def _should_request_compaction() -> bool:
total_tokens = llm.input_token_count + llm.output_token_count
context_window = llm.context_window
return total_tokens > context_window * COMPACTION_THRESHOLD
def prepare_user_message(message: str) -> str:
if not _should_request_compaction():
return message
compaction_instruction = (
"After finishing this task, call runtime.selfref.context.compact(...) "
"to checkpoint your context."
)
return f"{message}\n\n{compaction_instruction}"
This pattern keeps context fresh without manual intervention.
Pattern 3: Dynamic Environment Block
Inject runtime context via template parameters:
def build_environment_block(workspace: Path) -> str:
git_branch = run_command(["git", "branch", "--show-current"], workspace)
git_status = run_command(["git", "status", "--porcelain"], workspace)
return f"""
# Environment
- Workspace: {workspace}
- Git branch: {git_branch}
- Modified files: {len(git_status.splitlines())}
- Platform: {sys.platform}
- Python: {sys.version_info.major}.{sys.version_info.minor}
"""
TEMPLATE_PARAMS = {"environment_block": build_environment_block(workspace)}
Pattern 4: Supervisor Agent
An outer agent that delegates to specialized inner agents:
@llm_chat(llm_interface=llm, toolkit=[delegate_to_coder, delegate_to_reviewer])
async def supervisor(task: str, history: list | None = None):
"""
Route tasks to the appropriate specialist.
Use delegate_to_coder for implementation work.
Use delegate_to_reviewer for code review.
Synthesize results before responding.
"""
pass
@tool
async def delegate_to_coder(task: str, context: str) -> str:
"""Delegate a coding task to the implementation specialist."""
result = []
async for output in coder_agent(f"{task}\n\nContext: {context}", []):
if is_response_yield(output):
result.append(output.response)
return "\n".join(result)
The supervisor pattern enables:
- Different models for different tasks (cheap model routes, expensive model implements)
- Isolated context per specialist (each starts fresh)
- Clear delegation boundaries
Pattern 5: Session Persistence
Persist history externally for resume:
import json
from pathlib import Path
SESSION_DIR = Path("~/.myagent/sessions").expanduser()
def save_session(session_id: str, history: list, metadata: dict):
SESSION_DIR.mkdir(parents=True, exist_ok=True)
path = SESSION_DIR / f"{session_id}.json"
path.write_text(json.dumps({
"history": history,
"metadata": metadata,
}, ensure_ascii=False))
def load_session(session_id: str) -> tuple[list, dict] | None:
path = SESSION_DIR / f"{session_id}.json"
if not path.exists():
return None
data = json.loads(path.read_text())
return data["history"], data["metadata"]
Override which tools the agent sees based on context:
from SimpleLLMFunc.runtime.selfref import SELF_REFERENCE_TOOLKIT_OVERRIDE_TEMPLATE_PARAM
def build_runtime_toolkit(workspace: Path) -> list:
"""Build toolkit dynamically based on workspace state."""
tools = [*repl.toolset]
if (workspace / "package.json").exists():
tools.extend(node_tools)
elif (workspace / "pyproject.toml").exists():
tools.extend(python_tools)
tools.extend(FileToolset(workspace).toolset)
return tools
template_params = {
SELF_REFERENCE_TOOLKIT_OVERRIDE_TEMPLATE_PARAM: build_runtime_toolkit(workspace),
}
The Key Insight
The harness is where context engineering happens:
- What the model sees = what you put in the template params + history + tool results
- When the model forgets = when you fail to compact or persist
- Why the model fails = usually missing context, not missing capability
Build harnesses that make the right context inevitable, not optional.