@llm_chat

@llm_chat creates a multi-turn conversational agent. It manages history, executes a ReAct loop with tools, streams responses, and optionally integrates with SelfRef for durable context.

Basic Usage

from SimpleLLMFunc import OpenAICompatible, llm_chat

models = OpenAICompatible.load_from_json_file("provider.json")
llm = models["openrouter"]["openai/gpt-4o"]


@llm_chat(llm_interface=llm, stream=True)
async def assistant(message: str, history: list | None = None):
    """
    You are a helpful, concise assistant.
    Answer directly without unnecessary preamble.
    """
    pass

History Management

The history parameter (or chat_history) is special. The framework:

Takes your provided history as the conversation transcript
Appends the current user message
Runs the ReAct loop
Returns updated history in output.messages

history = []

async for output in assistant("What is Python?", history):
    if is_response_yield(output):
        print(output.response)
        history = output.messages  # Save for next turn

# Next turn — agent remembers the conversation
async for output in assistant("What are its main features?", history):
    ...

History is external — you control storage, persistence, and branching.

Streaming

With stream=True, you receive chunks as events:

from SimpleLLMFunc.hooks import is_event_yield, LLMChunkArriveEvent

async for output in assistant("Tell me about Python", history):
    if is_event_yield(output):
        if isinstance(output.event, LLMChunkArriveEvent):
            print(output.event.accumulated_content, end="", flush=True)
    elif is_response_yield(output):
        history = output.messages

With stream=False, the model response arrives as a single ResponseYield.

Multimodal User Messages

For multimodal chat input, use exactly one canonical user-message object: UserChatMessage. This keeps @llm_chat as an Agent abstraction over one user turn and avoids multiple competing image-parameter styles.

from SimpleLLMFunc import llm_chat
from SimpleLLMFunc.type import ImgPath, ImgUrl, UserChatMessage

@llm_chat(llm_interface=llm, stream=True)
async def vision_agent(message: UserChatMessage, history: list | None = None):
    """Answer questions about user-provided images."""
    pass

async for output in vision_agent(
    UserChatMessage.multimodal(
        "Compare these images and list visible differences.",
        ImgUrl("https://example.com/reference.jpg", detail="high"),
        ImgPath("./candidate.png", detail="high"),
    ),
    history=[],
):
    ...

UserChatMessage.multimodal(...) accepts text plus any number of ImgUrl / ImgPath values. It normalizes them to an OpenAI-compatible user message with text and image_url content parts. Future modalities should extend UserChatMessage, not introduce another chat input convention.

Tools

@llm_chat(llm_interface=llm, toolkit=[search, calculate], stream=True, max_tool_calls=10)
async def agent(message: str, history: list | None = None):
    """
    A research assistant. Use search for facts, calculate for math.
    Always cite your sources.
    """
    pass

The ReAct loop handles tool calling automatically:

LLM decides to call a tool → ToolCallStartEvent
Framework executes the tool → ToolCallEndEvent
The runtime records the tool result as an internal transcript patch
Context is recompiled → LLM sees the patched transcript → decides next action

max_tool_calls limits total tool calls per invocation. Default is framework-defined. None means unlimited.

SelfRef Integration

For agents that need durable memory, context compaction, or sub-agent forking:

from SimpleLLMFunc.builtin import PyRepl

repl = PyRepl()


@llm_chat(
    llm_interface=llm,
    toolkit=[*repl.toolset],
    stream=True,
    self_reference_key="agent_main",
)
async def coding_agent(message: str, history: list | None = None):
    """
    A coding agent with persistent memory.
    Use runtime.selfref.context.remember(...) to store durable lessons.
    Use runtime.selfref.context.compact(...) when context grows large.
    """
    pass

With self_reference_key, the framework:

Binds a SelfReference backend to this key
Creates a SelfRefSession per invocation
Makes selfref primitives available in PyRepl
Persists updated history after each turn

See SelfRef for the full context model.

Template Parameters

Inject runtime values into the system prompt:

@llm_chat(llm_interface=llm, toolkit=[...])
async def agent(message: str, history: list | None = None):
    """
    You are an assistant for {project_name}.

    Workspace: {workspace_path}
    Git branch: {git_branch}
    """
    pass

async for output in agent(
    "Fix the bug",
    history,
    _template_params={
        "project_name": "MyApp",
        "workspace_path": "/src",
        "git_branch": "main",
    },
):
    ...

Return Mode

Mode	Behavior
`"text"` (default)	`output.response` is the final text string
`"raw"`	`output.response` is the raw message dict from the provider

System Prompt Construction

For @llm_chat, the final system prompt is built from multiple sources:

Docstring → base system prompt (with template params applied)
Tool best practices → <tool_best_practices> block prepended
Must principles → <must_principles> block appended (use native tool calls)
SelfRef experiences → rendered into the system prompt if active
Latest system message in history → overrides docstring if present

Write your docstring as stable assistant policy — identity, behavioral rules, and long-lived constraints. Put per-turn context in arguments or template params.

Concurrent Sessions

Multiple independent conversations can run simultaneously:

# Each call gets its own history — no shared state
task1 = assistant("Question A", history_a)
task2 = assistant("Question B", history_b)

# Run concurrently
results = await asyncio.gather(
    consume_stream(task1),
    consume_stream(task2),
)

Parameters Reference

Parameter	Type	Default	Description
`llm_interface`	`LLM_Interface`	required	The model to call
`toolkit`	`List[Tool]`	`None`	Available tools
`max_tool_calls`	`int \| None`	framework default	Tool call limit
`stream`	`bool`	`False`	Enable streaming
`self_reference`	`SelfReference \| None`	`None`	Explicit selfref backend
`self_reference_key`	`str \| None`	`None`	Auto-create selfref for this key
`**llm_kwargs`	`Any`	—	Passed to LLM (temperature, etc.)

Special call-time parameters:

_template_params: Dict[str, Any] — template values
_abort_signal: AbortSignal — cancellation
_too_long_to_file: bool — truncate long tool results to file

→ API Reference: Decorators

​@llm_chat

​Basic Usage

​History Management

​Streaming

​Multimodal User Messages

​Tools

​SelfRef Integration

​Template Parameters

​Return Mode

​System Prompt Construction

​Concurrent Sessions

​Parameters Reference