llm_chat Decorator

llm_chat is the chat-oriented decorator in SimpleLLMFunc. It is designed for multi-turn conversations, history-aware assistants, tool-augmented chat flows, and Agent-style interactions.

Quick Overview

Multi-turn Conversation

Manage and return history while preserving context across turns.

Streaming Output

Return an async generator that works naturally with live chat interfaces.

Tool Integration

Let the model call tools during a conversation.

Runtime Memory

Use it together with PyRepl, SelfReference, and runtime primitives.

Event Stream

Inspect the full ReAct loop with enable_event=True.

Important Note

llm_chat only supports functions defined with async def. It returns an async generator, so consume it with async for.

Basic Syntax

from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import llm_chat


@llm_chat(
    llm_interface=llm_interface,
    toolkit=None,
    max_tool_calls=None,
    stream=True,
    self_reference=None,
    self_reference_key=None,
    **llm_kwargs,
)
async def your_chat_function(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """Describe the assistant role and behavior here."""
    pass

As with llm_function, the function body itself is not executed. The docstring is the prompt, and pass is usually the correct implementation.

Parameters

Core parameters

llm_interface: required, the LLM interface instance
toolkit: optional, attached tools
max_tool_calls: optional, tool call limit; default None
stream: whether to enable streaming output

Return modes and event stream

return_mode: text or raw; only matters when enable_event=False
enable_event=False: yields (response, messages)
enable_event=True: yields ReactOutput
For full event details, see Event Stream System

Runtime and memory

strict_signature: enforce a stricter canonical signature
self_reference: shared SelfReference instance
self_reference_key: memory key for the current chat function
other llm_kwargs are passed to the underlying model interface

Abort During a Turn

You can pass _abort_signal to interrupt the currently running turn, stop streaming output, and cancel in-flight tool calls.

from SimpleLLMFunc.hooks import ABORT_SIGNAL_PARAM, AbortSignal

abort_signal = AbortSignal()

async for output in your_chat_function(
    "Hello",
    history=[],
    **{ABORT_SIGNAL_PARAM: abort_signal},
):
    ...

# Trigger from another coroutine
abort_signal.abort("user_interrupt")

When enable_event=True, ReactEndEvent.extra contains aborted: true and, if available, an abort_reason. See Abort and Cancellation.

Return Values

When enable_event=False, llm_chat yields:

chunk: a streaming chunk or the full response
updated_history: the updated conversation history

When stream=True and return_mode="text", the generator yields an empty string at the end as a completion marker.

When enable_event=True, you receive ReactOutput, which can be:

ResponseYield: response payload plus message list
EventYield: intermediate events from the ReAct loop, model calls, tool calls, and iteration lifecycle

History parameters should be named history or chat_history. If the framework cannot recognize the history format, it will ignore it and warn. If the history contains system messages, the latest one overrides the docstring-based system prompt.

Where to Start

Basic Chat Assistant

Start here if you want to understand the simplest multi-turn chat flow.

Tool-Augmented Chat

Start here if you want the model to call APIs, file tools, or business logic.

Runtime Context

Start here if you need Agent-style runtime context and self-reference behavior.

Troubleshooting

Start here if you are dealing with history growth, timeouts, or logging.

Examples

Basic chat assistant

import asyncio
from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import OpenAICompatible, llm_chat

llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]


@llm_chat(llm_interface=llm, stream=True)
async def simple_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """You are a friendly chat assistant."""
    pass


async def main():
    history = []
    user_message = "Hello, please introduce yourself."

    print(f"User: {user_message}")
    print("Assistant: ", end="", flush=True)

    async for chunk, updated_history in simple_chat(user_message, history):
        if chunk:
            print(chunk, end="", flush=True)
        history = updated_history

    print()


asyncio.run(main())

Chat with tool calling

import asyncio
from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import OpenAICompatible, llm_chat, tool


@tool(name="get_weather", description="Get weather information for a city")
async def get_weather(city: str) -> Dict[str, str]:
    return {
        "city": city,
        "temperature": "25°C",
        "humidity": "60%",
        "condition": "Sunny",
    }


llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]


@llm_chat(llm_interface=llm, toolkit=[get_weather], stream=True)
async def weather_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """You are a weather assistant. Use the weather tool when users ask about weather."""
    pass

Interactive multi-turn session

async def demo():
    history = []
    messages = [
        "What is a Python list comprehension?",
        "How do I use async programming?",
    ]

    for user_message in messages:
        print(f"\nUser: {user_message}")
        print("Assistant: ", end="", flush=True)

        async for chunk, updated_history in simple_chat(user_message, history):
            if chunk:
                print(chunk, end="", flush=True)
            history = updated_history

        print()

Advanced Features

Runtime memory and self-reference

When you mount runtime-backed tools such as PyRepl, llm_chat can also work with SelfReference and runtime primitives. Typical use cases:

persistent memory tied to a chat function
forked execution for sub-tasks
runtime inspection inside code execution
context management beyond plain chat history

Useful self-reference primitives:

runtime.selfref.guide(): overview plus fork/context best practices
runtime.selfref.context.inspect(key=None): full snapshot with experiences, structured summary, and read-only messages
runtime.selfref.context.remember(text, key=None): append durable experience
runtime.selfref.context.forget(experience_id, key=None): remove wrong or stale durable experience
runtime.selfref.context.compact(..., key=None): queue milestone compaction; the framework tries to commit it after the current tool batch so the next same-turn LLM call sees the compacted context, with finalize as fallback
runtime.selfref.fork.spawn(message, ...): spawn a child fork; the child inherits the pre-fork context snapshot rather than the parent’s pending fork tool-call scene
runtime.selfref.fork.gather_all(fork_id_or_list=None, include_history=False): gather fork results as dict[fork_id -> ForkResult]

Fork results are compact by default. Read status first, then response or result; inspect error_type / error_message when status is error. Use include_history=True only when you actually need full child history.

Return modes

Use return_mode to control the non-event-stream response shape:

@llm_chat(llm_interface=llm, stream=True, return_mode="text")
async def text_mode_chat(message: str, history=None):
    """Chat that yields text output."""
    pass


@llm_chat(llm_interface=llm, stream=True, return_mode="raw")
async def raw_mode_chat(message: str, history=None):
    """Chat that yields raw model responses."""
    pass

return_mode only matters when enable_event=False. In event mode, ResponseYield.response always carries the raw response object or streaming chunk.

Concurrent chat sessions

Because the API is async-native, you can run multiple chat sessions concurrently with asyncio.gather(...).

Best Practices

Error handling

Wrap chat execution in try/except blocks and log failures together with the relevant history and trace IDs.

Timeout control

Use AbortSignal or external async timeout helpers when a response or tool call should be interrupted.

History management

If your history grows too large, add wrappers that trim, compress, or persist history outside the decorator.

Event stream debugging

Turn on enable_event=True to inspect model calls, tool calls, and iteration flow when debugging complex Agent behavior.

When to Use llm_chat

Use llm_chat when:

you need multi-turn state through message history
you want streaming conversational UX
you want Agent-style tool calling and runtime context
you need event-driven observation of a conversation loop

Use llm_function instead when the task maps naturally to a stateless function call with typed inputs and outputs.

Overview

Getting Started

Infrastructure

Developer Experience

Agent Execution

Tools and Runtime

UI and Interaction

Integrations and Examples

Quick Overview

Multi-turn Conversation

Streaming Output

Tool Integration

Runtime Memory

Event Stream

Important Note

Basic Syntax

Parameters

Abort During a Turn

Return Values

Where to Start

Basic Chat Assistant

Tool-Augmented Chat

Runtime Context

Troubleshooting

Examples

Basic chat assistant

Chat with tool calling

Interactive multi-turn session

Advanced Features

Runtime memory and self-reference

Return modes

Concurrent chat sessions

Best Practices

When to Use llm_chat

Overview

Getting Started

Infrastructure

Developer Experience

Agent Execution

Tools and Runtime

UI and Interaction

Integrations and Examples

​Quick Overview

Multi-turn Conversation

Streaming Output

Tool Integration

Runtime Memory

Event Stream

​Important Note

​Basic Syntax

​Parameters

​Abort During a Turn

​Return Values

​Where to Start

Basic Chat Assistant

Tool-Augmented Chat

Runtime Context

Troubleshooting

​Examples

​Basic chat assistant

​Chat with tool calling

​Interactive multi-turn session

​Advanced Features

​Runtime memory and self-reference

​Return modes

​Concurrent chat sessions

​Best Practices

​When to Use llm_chat

Quick Overview

Important Note

Basic Syntax

Parameters

Abort During a Turn

Return Values

Where to Start

Examples

Basic chat assistant

Chat with tool calling

Interactive multi-turn session

Advanced Features

Runtime memory and self-reference

Return modes

Concurrent chat sessions

Best Practices

When to Use llm_chat