llm_chat is the chat-oriented decorator in SimpleLLMFunc. It is designed for multi-turn conversations, history-aware assistants, tool-augmented chat flows, and Agent-style interactions.
Quick Overview
Multi-turn Conversation
Manage and return history while preserving context across turns.
Streaming Output
Return an async generator that works naturally with live chat interfaces.
Tool Integration
Let the model call tools during a conversation.
Runtime Memory
Use it together with
PyRepl, SelfReference, and runtime primitives.Event Stream
Inspect the full ReAct loop with
enable_event=True.Important Note
llm_chat only supports functions defined with async def. It returns an async generator, so consume it with async for.Basic Syntax
As with
llm_function, the function body itself is not executed. The docstring is the prompt, and pass is usually the correct implementation.Parameters
Core parameters
Core parameters
llm_interface: required, the LLM interface instancetoolkit: optional, attached toolsmax_tool_calls: optional, tool call limit; defaultNonestream: whether to enable streaming output
Return modes and event stream
Return modes and event stream
return_mode:textorraw; only matters whenenable_event=Falseenable_event=False: yields(response, messages)enable_event=True: yieldsReactOutput- For full event details, see Event Stream System
Runtime and memory
Runtime and memory
strict_signature: enforce a stricter canonical signatureself_reference: sharedSelfReferenceinstanceself_reference_key: memory key for the current chat function- other
llm_kwargsare passed to the underlying model interface
Abort During a Turn
You can pass_abort_signal to interrupt the currently running turn, stop streaming output, and cancel in-flight tool calls.
enable_event=True, ReactEndEvent.extra contains aborted: true and, if available, an abort_reason.
See Abort and Cancellation.
Return Values
Whenenable_event=False, llm_chat yields:
chunk: a streaming chunk or the full responseupdated_history: the updated conversation history
When
stream=True and return_mode="text", the generator yields an empty string at the end as a completion marker.enable_event=True, you receive ReactOutput, which can be:
ResponseYield: response payload plus message listEventYield: intermediate events from the ReAct loop, model calls, tool calls, and iteration lifecycle
History parameters should be named
history or chat_history. If the framework cannot recognize the history format, it will ignore it and warn. If the history contains system messages, the latest one overrides the docstring-based system prompt.Where to Start
Basic Chat Assistant
Start here if you want to understand the simplest multi-turn chat flow.
Tool-Augmented Chat
Start here if you want the model to call APIs, file tools, or business logic.
Runtime Context
Start here if you need Agent-style runtime context and self-reference behavior.
Troubleshooting
Start here if you are dealing with history growth, timeouts, or logging.
Examples
Basic chat assistant
Chat with tool calling
Interactive multi-turn session
Advanced Features
Runtime memory and self-reference
When you mount runtime-backed tools such asPyRepl, llm_chat can also work with SelfReference and runtime primitives.
Typical use cases:
- persistent memory tied to a chat function
- forked execution for sub-tasks
- runtime inspection inside code execution
- context management beyond plain chat history
runtime.selfref.guide(): overview plus fork/context best practicesruntime.selfref.context.inspect(key=None): full snapshot withexperiences, structuredsummary, and read-onlymessagesruntime.selfref.context.remember(text, key=None): append durable experienceruntime.selfref.context.forget(experience_id, key=None): remove wrong or stale durable experienceruntime.selfref.context.compact(..., key=None): queue milestone compaction; the framework tries to commit it after the current tool batch so the next same-turn LLM call sees the compacted context, with finalize as fallbackruntime.selfref.fork.spawn(message, ...): spawn a child fork; the child inherits the pre-fork context snapshot rather than the parent’s pending fork tool-call sceneruntime.selfref.fork.gather_all(fork_id_or_list=None, include_history=False): gather fork results asdict[fork_id -> ForkResult]
status first, then response or result; inspect error_type / error_message when status is error. Use include_history=True only when you actually need full child history.
Return modes
Usereturn_mode to control the non-event-stream response shape:
return_mode only matters when enable_event=False. In event mode, ResponseYield.response always carries the raw response object or streaming chunk.Concurrent chat sessions
Because the API is async-native, you can run multiple chat sessions concurrently withasyncio.gather(...).
Best Practices
Error handling
Error handling
Wrap chat execution in
try/except blocks and log failures together with the relevant history and trace IDs.Timeout control
Timeout control
Use
AbortSignal or external async timeout helpers when a response or tool call should be interrupted.History management
History management
If your history grows too large, add wrappers that trim, compress, or persist history outside the decorator.
Event stream debugging
Event stream debugging
Turn on
enable_event=True to inspect model calls, tool calls, and iteration flow when debugging complex Agent behavior.When to Use llm_chat
Usellm_chat when:
- you need multi-turn state through message history
- you want streaming conversational UX
- you want Agent-style tool calling and runtime context
- you need event-driven observation of a conversation loop
llm_function instead when the task maps naturally to a stateless function call with typed inputs and outputs.