Skip to main content
llm_chat is the chat-oriented decorator in SimpleLLMFunc. It is designed for multi-turn conversations, history-aware assistants, tool-augmented chat flows, and Agent-style interactions.

Quick Overview

Multi-turn Conversation

Manage and return history while preserving context across turns.

Streaming Output

Return an async generator that works naturally with live chat interfaces.

Tool Integration

Let the model call tools during a conversation.

Runtime Memory

Use it together with PyRepl, SelfReference, and runtime primitives.

Event Stream

Inspect the full ReAct loop with enable_event=True.

Important Note

llm_chat only supports functions defined with async def. It returns an async generator, so consume it with async for.

Basic Syntax

from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import llm_chat


@llm_chat(
    llm_interface=llm_interface,
    toolkit=None,
    max_tool_calls=None,
    stream=True,
    self_reference=None,
    self_reference_key=None,
    **llm_kwargs,
)
async def your_chat_function(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """Describe the assistant role and behavior here."""
    pass
As with llm_function, the function body itself is not executed. The docstring is the prompt, and pass is usually the correct implementation.

Parameters

  • llm_interface: required, the LLM interface instance
  • toolkit: optional, attached tools
  • max_tool_calls: optional, tool call limit; default None
  • stream: whether to enable streaming output
  • return_mode: text or raw; only matters when enable_event=False
  • enable_event=False: yields (response, messages)
  • enable_event=True: yields ReactOutput
  • For full event details, see Event Stream System
  • strict_signature: enforce a stricter canonical signature
  • self_reference: shared SelfReference instance
  • self_reference_key: memory key for the current chat function
  • other llm_kwargs are passed to the underlying model interface

Abort During a Turn

You can pass _abort_signal to interrupt the currently running turn, stop streaming output, and cancel in-flight tool calls.
from SimpleLLMFunc.hooks import ABORT_SIGNAL_PARAM, AbortSignal

abort_signal = AbortSignal()

async for output in your_chat_function(
    "Hello",
    history=[],
    **{ABORT_SIGNAL_PARAM: abort_signal},
):
    ...

# Trigger from another coroutine
abort_signal.abort("user_interrupt")
When enable_event=True, ReactEndEvent.extra contains aborted: true and, if available, an abort_reason. See Abort and Cancellation.

Return Values

When enable_event=False, llm_chat yields:
  • chunk: a streaming chunk or the full response
  • updated_history: the updated conversation history
When stream=True and return_mode="text", the generator yields an empty string at the end as a completion marker.
When enable_event=True, you receive ReactOutput, which can be:
  • ResponseYield: response payload plus message list
  • EventYield: intermediate events from the ReAct loop, model calls, tool calls, and iteration lifecycle
History parameters should be named history or chat_history. If the framework cannot recognize the history format, it will ignore it and warn. If the history contains system messages, the latest one overrides the docstring-based system prompt.

Where to Start

Basic Chat Assistant

Start here if you want to understand the simplest multi-turn chat flow.

Tool-Augmented Chat

Start here if you want the model to call APIs, file tools, or business logic.

Runtime Context

Start here if you need Agent-style runtime context and self-reference behavior.

Troubleshooting

Start here if you are dealing with history growth, timeouts, or logging.

Examples

Basic chat assistant

import asyncio
from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import OpenAICompatible, llm_chat

llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]


@llm_chat(llm_interface=llm, stream=True)
async def simple_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """You are a friendly chat assistant."""
    pass


async def main():
    history = []
    user_message = "Hello, please introduce yourself."

    print(f"User: {user_message}")
    print("Assistant: ", end="", flush=True)

    async for chunk, updated_history in simple_chat(user_message, history):
        if chunk:
            print(chunk, end="", flush=True)
        history = updated_history

    print()


asyncio.run(main())

Chat with tool calling

import asyncio
from typing import AsyncGenerator, Dict, List, Tuple

from SimpleLLMFunc import OpenAICompatible, llm_chat, tool


@tool(name="get_weather", description="Get weather information for a city")
async def get_weather(city: str) -> Dict[str, str]:
    return {
        "city": city,
        "temperature": "25°C",
        "humidity": "60%",
        "condition": "Sunny",
    }


llm = OpenAICompatible.load_from_json_file("provider.json")["openai"]["gpt-3.5-turbo"]


@llm_chat(llm_interface=llm, toolkit=[get_weather], stream=True)
async def weather_chat(
    message: str,
    history: List[Dict[str, str]] | None = None,
) -> AsyncGenerator[Tuple[str, List[Dict[str, str]]], None]:
    """You are a weather assistant. Use the weather tool when users ask about weather."""
    pass

Interactive multi-turn session

async def demo():
    history = []
    messages = [
        "What is a Python list comprehension?",
        "How do I use async programming?",
    ]

    for user_message in messages:
        print(f"\nUser: {user_message}")
        print("Assistant: ", end="", flush=True)

        async for chunk, updated_history in simple_chat(user_message, history):
            if chunk:
                print(chunk, end="", flush=True)
            history = updated_history

        print()

Advanced Features

Runtime memory and self-reference

When you mount runtime-backed tools such as PyRepl, llm_chat can also work with SelfReference and runtime primitives. Typical use cases:
  • persistent memory tied to a chat function
  • forked execution for sub-tasks
  • runtime inspection inside code execution
  • context management beyond plain chat history
Useful self-reference primitives:
  • runtime.selfref.guide(): overview plus fork/context best practices
  • runtime.selfref.context.inspect(key=None): full snapshot with experiences, structured summary, and read-only messages
  • runtime.selfref.context.remember(text, key=None): append durable experience
  • runtime.selfref.context.forget(experience_id, key=None): remove wrong or stale durable experience
  • runtime.selfref.context.compact(..., key=None): queue milestone compaction; the framework tries to commit it after the current tool batch so the next same-turn LLM call sees the compacted context, with finalize as fallback
  • runtime.selfref.fork.spawn(message, ...): spawn a child fork; the child inherits the pre-fork context snapshot rather than the parent’s pending fork tool-call scene
  • runtime.selfref.fork.gather_all(fork_id_or_list=None, include_history=False): gather fork results as dict[fork_id -> ForkResult]
Fork results are compact by default. Read status first, then response or result; inspect error_type / error_message when status is error. Use include_history=True only when you actually need full child history.

Return modes

Use return_mode to control the non-event-stream response shape:
@llm_chat(llm_interface=llm, stream=True, return_mode="text")
async def text_mode_chat(message: str, history=None):
    """Chat that yields text output."""
    pass


@llm_chat(llm_interface=llm, stream=True, return_mode="raw")
async def raw_mode_chat(message: str, history=None):
    """Chat that yields raw model responses."""
    pass
return_mode only matters when enable_event=False. In event mode, ResponseYield.response always carries the raw response object or streaming chunk.

Concurrent chat sessions

Because the API is async-native, you can run multiple chat sessions concurrently with asyncio.gather(...).

Best Practices

Wrap chat execution in try/except blocks and log failures together with the relevant history and trace IDs.
Use AbortSignal or external async timeout helpers when a response or tool call should be interrupted.
If your history grows too large, add wrappers that trim, compress, or persist history outside the decorator.
Turn on enable_event=True to inspect model calls, tool calls, and iteration flow when debugging complex Agent behavior.

When to Use llm_chat

Use llm_chat when:
  • you need multi-turn state through message history
  • you want streaming conversational UX
  • you want Agent-style tool calling and runtime context
  • you need event-driven observation of a conversation loop
Use llm_function instead when the task maps naturally to a stateless function call with typed inputs and outputs.