> ## Documentation Index
> Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt
> Use this file to discover all available pages before exploring further.

# @llm_chat

> Multi-turn agents with history, tools, streaming, and SelfRef

# @llm\_chat

`@llm_chat` creates a multi-turn conversational agent. It manages history, executes a ReAct loop with tools, streams responses, and optionally integrates with SelfRef for durable context.

## Basic Usage

```python theme={null}
from SimpleLLMFunc import OpenAICompatible, llm_chat

models = OpenAICompatible.load_from_json_file("provider.json")
llm = models["openrouter"]["openai/gpt-4o"]


@llm_chat(llm_interface=llm, stream=True)
async def assistant(message: str, history: list | None = None):
    """
    You are a helpful, concise assistant.
    Answer directly without unnecessary preamble.
    """
    pass
```

## History Management

The `history` parameter (or `chat_history`) is special. The framework:

1. Takes your provided history as the conversation transcript
2. Appends the current user message
3. Runs the ReAct loop
4. Returns updated history in `output.messages`

```python theme={null}
history = []

async for output in assistant("What is Python?", history):
    if is_response_yield(output):
        print(output.response)
        history = output.messages  # Save for next turn

# Next turn — agent remembers the conversation
async for output in assistant("What are its main features?", history):
    ...
```

History is **external** — you control storage, persistence, and branching.

## Streaming

With `stream=True`, you receive chunks as events:

```python theme={null}
from SimpleLLMFunc.hooks import is_event_yield, LLMChunkArriveEvent

async for output in assistant("Tell me about Python", history):
    if is_event_yield(output):
        if isinstance(output.event, LLMChunkArriveEvent):
            print(output.event.accumulated_content, end="", flush=True)
    elif is_response_yield(output):
        history = output.messages
```

With `stream=False`, the model response arrives as a single `ResponseYield`.

## Multimodal User Messages

For multimodal chat input, use exactly one canonical user-message object: `UserChatMessage`. This keeps `@llm_chat` as an Agent abstraction over one user turn and avoids multiple competing image-parameter styles.

```python theme={null}
from SimpleLLMFunc import llm_chat
from SimpleLLMFunc.type import ImgPath, ImgUrl, UserChatMessage

@llm_chat(llm_interface=llm, stream=True)
async def vision_agent(message: UserChatMessage, history: list | None = None):
    """Answer questions about user-provided images."""
    pass

async for output in vision_agent(
    UserChatMessage.multimodal(
        "Compare these images and list visible differences.",
        ImgUrl("https://example.com/reference.jpg", detail="high"),
        ImgPath("./candidate.png", detail="high"),
    ),
    history=[],
):
    ...
```

`UserChatMessage.multimodal(...)` accepts text plus any number of `ImgUrl` / `ImgPath` values. It normalizes them to an OpenAI-compatible user message with `text` and `image_url` content parts. Future modalities should extend `UserChatMessage`, not introduce another chat input convention.

## Tools

```python theme={null}
@llm_chat(llm_interface=llm, toolkit=[search, calculate], stream=True, max_tool_calls=10)
async def agent(message: str, history: list | None = None):
    """
    A research assistant. Use search for facts, calculate for math.
    Always cite your sources.
    """
    pass
```

The ReAct loop handles tool calling automatically:

1. LLM decides to call a tool → `ToolCallStartEvent`
2. Framework executes the tool → `ToolCallEndEvent`
3. The runtime records the tool result as an internal transcript patch
4. Context is recompiled → LLM sees the patched transcript → decides next action

`max_tool_calls` limits total tool calls per invocation. Default is framework-defined. `None` means unlimited.

## SelfRef Integration

For agents that need durable memory, context compaction, or sub-agent forking:

```python theme={null}
from SimpleLLMFunc.builtin import PyRepl

repl = PyRepl()


@llm_chat(
    llm_interface=llm,
    toolkit=[*repl.toolset],
    stream=True,
    self_reference_key="agent_main",
)
async def coding_agent(message: str, history: list | None = None):
    """
    A coding agent with persistent memory.
    Use runtime.selfref.context.remember(...) to store durable lessons.
    Use runtime.selfref.context.compact(...) when context grows large.
    """
    pass
```

With `self_reference_key`, the framework:

* Binds a `SelfReference` backend to this key
* Creates a `SelfRefSession` per invocation
* Makes selfref primitives available in PyRepl
* Persists updated history after each turn

See [SelfRef](/context/selfref) for the full context model.

## Template Parameters

Inject runtime values into the system prompt:

```python theme={null}
@llm_chat(llm_interface=llm, toolkit=[...])
async def agent(message: str, history: list | None = None):
    """
    You are an assistant for {project_name}.

    Workspace: {workspace_path}
    Git branch: {git_branch}
    """
    pass

async for output in agent(
    "Fix the bug",
    history,
    _template_params={
        "project_name": "MyApp",
        "workspace_path": "/src",
        "git_branch": "main",
    },
):
    ...
```

## Return Mode

| Mode               | Behavior                                                    |
| ------------------ | ----------------------------------------------------------- |
| `"text"` (default) | `output.response` is the final text string                  |
| `"raw"`            | `output.response` is the raw message dict from the provider |

## System Prompt Construction

For `@llm_chat`, the final system prompt is built from multiple sources:

1. **Docstring** → base system prompt (with template params applied)
2. **Tool best practices** → `<tool_best_practices>` block prepended
3. **Must principles** → `<must_principles>` block appended (use native tool calls)
4. **SelfRef experiences** → rendered into the system prompt if active
5. **Latest system message in history** → overrides docstring if present

Write your docstring as **stable assistant policy** — identity, behavioral rules, and long-lived constraints. Put per-turn context in arguments or template params.

## Concurrent Sessions

Multiple independent conversations can run simultaneously:

```python theme={null}
# Each call gets its own history — no shared state
task1 = assistant("Question A", history_a)
task2 = assistant("Question B", history_b)

# Run concurrently
results = await asyncio.gather(
    consume_stream(task1),
    consume_stream(task2),
)
```

## Parameters Reference

| Parameter            | Type                    | Default           | Description                       |
| -------------------- | ----------------------- | ----------------- | --------------------------------- |
| `llm_interface`      | `LLM_Interface`         | required          | The model to call                 |
| `toolkit`            | `List[Tool]`            | `None`            | Available tools                   |
| `max_tool_calls`     | `int \| None`           | framework default | Tool call limit                   |
| `stream`             | `bool`                  | `False`           | Enable streaming                  |
| `self_reference`     | `SelfReference \| None` | `None`            | Explicit selfref backend          |
| `self_reference_key` | `str \| None`           | `None`            | Auto-create selfref for this key  |
| `**llm_kwargs`       | `Any`                   | —                 | Passed to LLM (temperature, etc.) |

Special call-time parameters:

* `_template_params: Dict[str, Any]` — template values
* `_abort_signal: AbortSignal` — cancellation
* `_too_long_to_file: bool` — truncate long tool results to file

→ [API Reference: Decorators](/api/decorators)