Skip to main content

Documentation Index

Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt

Use this file to discover all available pages before exploring further.

@llm_chat

@llm_chat creates a multi-turn conversational agent. It manages history, executes a ReAct loop with tools, streams responses, and optionally integrates with SelfRef for durable context.

Basic Usage

from SimpleLLMFunc import OpenAICompatible, llm_chat

models = OpenAICompatible.load_from_json_file("provider.json")
llm = models["openrouter"]["openai/gpt-4o"]


@llm_chat(llm_interface=llm, stream=True)
async def assistant(message: str, history: list | None = None):
    """
    You are a helpful, concise assistant.
    Answer directly without unnecessary preamble.
    """
    pass

History Management

The history parameter (or chat_history) is special. The framework:
  1. Takes your provided history as the conversation transcript
  2. Appends the current user message
  3. Runs the ReAct loop
  4. Returns updated history in output.messages
history = []

async for output in assistant("What is Python?", history):
    if is_response_yield(output):
        print(output.response)
        history = output.messages  # Save for next turn

# Next turn — agent remembers the conversation
async for output in assistant("What are its main features?", history):
    ...
History is external — you control storage, persistence, and branching.

Streaming

With stream=True, you receive chunks as events:
from SimpleLLMFunc.hooks import is_event_yield, LLMChunkArriveEvent

async for output in assistant("Tell me about Python", history):
    if is_event_yield(output):
        if isinstance(output.event, LLMChunkArriveEvent):
            print(output.event.accumulated_content, end="", flush=True)
    elif is_response_yield(output):
        history = output.messages
With stream=False, the model response arrives as a single ResponseYield.

Tools

@llm_chat(llm_interface=llm, toolkit=[search, calculate], stream=True, max_tool_calls=10)
async def agent(message: str, history: list | None = None):
    """
    A research assistant. Use search for facts, calculate for math.
    Always cite your sources.
    """
    pass
The ReAct loop handles tool calling automatically:
  1. LLM decides to call a tool → ToolCallStartEvent
  2. Framework executes the tool → ToolCallEndEvent
  3. The runtime records the tool result as an internal transcript patch
  4. Context is recompiled → LLM sees the patched transcript → decides next action
max_tool_calls limits total tool calls per invocation. Default is framework-defined. None means unlimited.

SelfRef Integration

For agents that need durable memory, context compaction, or sub-agent forking:
from SimpleLLMFunc.builtin import PyRepl

repl = PyRepl()


@llm_chat(
    llm_interface=llm,
    toolkit=[*repl.toolset],
    stream=True,
    self_reference_key="agent_main",
)
async def coding_agent(message: str, history: list | None = None):
    """
    A coding agent with persistent memory.
    Use runtime.selfref.context.remember(...) to store durable lessons.
    Use runtime.selfref.context.compact(...) when context grows large.
    """
    pass
With self_reference_key, the framework:
  • Binds a SelfReference backend to this key
  • Creates a SelfRefSession per invocation
  • Makes selfref primitives available in PyRepl
  • Persists updated history after each turn
See SelfRef for the full context model.

Template Parameters

Inject runtime values into the system prompt:
@llm_chat(llm_interface=llm, toolkit=[...])
async def agent(message: str, history: list | None = None):
    """
    You are an assistant for {project_name}.

    Workspace: {workspace_path}
    Git branch: {git_branch}
    """
    pass

async for output in agent(
    "Fix the bug",
    history,
    _template_params={
        "project_name": "MyApp",
        "workspace_path": "/src",
        "git_branch": "main",
    },
):
    ...

Return Mode

ModeBehavior
"text" (default)output.response is the final text string
"raw"output.response is the raw message dict from the provider

System Prompt Construction

For @llm_chat, the final system prompt is built from multiple sources:
  1. Docstring → base system prompt (with template params applied)
  2. Tool best practices<tool_best_practices> block prepended
  3. Must principles<must_principles> block appended (use native tool calls)
  4. SelfRef experiences → rendered into the system prompt if active
  5. Latest system message in history → overrides docstring if present
Write your docstring as stable assistant policy — identity, behavioral rules, and long-lived constraints. Put per-turn context in arguments or template params.

Concurrent Sessions

Multiple independent conversations can run simultaneously:
# Each call gets its own history — no shared state
task1 = assistant("Question A", history_a)
task2 = assistant("Question B", history_b)

# Run concurrently
results = await asyncio.gather(
    consume_stream(task1),
    consume_stream(task2),
)

Parameters Reference

ParameterTypeDefaultDescription
llm_interfaceLLM_InterfacerequiredThe model to call
toolkitList[Tool]NoneAvailable tools
max_tool_callsint | Noneframework defaultTool call limit
streamboolFalseEnable streaming
self_referenceSelfReference | NoneNoneExplicit selfref backend
self_reference_keystr | NoneNoneAuto-create selfref for this key
**llm_kwargsAnyPassed to LLM (temperature, etc.)
Special call-time parameters:
  • _template_params: Dict[str, Any] — template values
  • _abort_signal: AbortSignal — cancellation
  • _too_long_to_file: bool — truncate long tool results to file
API Reference: Decorators