Skip to main content
SimpleLLMFunc includes built-in PyRepl support, which lets LLMs execute Python code inside a persistent runtime context. Unlike one-shot execution, PyRepl keeps variables and state alive across calls, so the model can solve tasks step by step.

Features

Persistent Context

Variables persist across calls, which is useful for iterative analysis and coding tasks.

Async, Non-blocking

Execution runs off the main event loop so UI and event streaming stay responsive.

Real-time Event Output

Receive stdout, stderr, and input requests via event_emitter.

Session Isolation

Different PyRepl instances do not share state.

Long Output Truncation

Large outputs can be stored to a temporary file and returned in truncated form.

Runtime Primitives

Expose controlled runtime capabilities through runtime.selfref.* and related primitives.

Quick Start

1

Create a PyRepl instance

from SimpleLLMFunc.builtin import PyRepl

repl = PyRepl()
tools = repl.toolset
2

Attach it to llm_chat

from SimpleLLMFunc import llm_chat

@llm_chat(
    llm_interface=llm,
    toolkit=tools,
    enable_event=True,
)
async def python_assistant(message: str, history=None):
    """You are a Python coding assistant. Use code execution when it helps."""
3

Consume the output

async for output in python_assistant("Create a list and compute its mean"):
    pass

Multiple isolated sessions

repl1 = PyRepl()
repl2 = PyRepl()


@llm_chat(toolkit=repl1.toolset, ...)
async def chat1(message: str, history=None):
    """Assistant backed by repl1."""


@llm_chat(toolkit=repl2.toolset, ...)
async def chat2(message: str, history=None):
    """Assistant backed by repl2."""

Tool Details

execute_code

Execute Python code and return the execution result.
execute_code has a default 600-second active execution timeout. Time spent waiting for input() does not count toward that limit. Each input() request also has its own default 300-second idle timeout.
When output exceeds 20,000 tokens, execute_code stores the full result in a temporary file and returns only a truncated preview plus a <system-reminder> note with the file path.
The guidance sent to the model encourages it to inspect runtime primitives with runtime.list_primitives() and runtime.get_primitive_spec(...), and reminds it that reset_repl clears REPL variables but keeps registered runtime backends.
Parameters
ParameterTypeDescription
codestrPython code to execute
timeout_secondsfloatOptional per-call timeout override
event_emitterToolEventEmitterOptional emitter for realtime output
Tool output for the model The model receives a natural-language summary that includes execution status, elapsed time, stdout, stderr, return value, and error information. If you need structured data in Python code, call PyRepl.execute() directly. Python API return value
{
    "success": bool,
    "stdout": str,
    "stderr": str,
    "return_value": Any,
    "error": str | None,
    "error_details": dict | None,
    "execution_time_ms": float,
}

Improved error localization

execute_code tries to return information about the actual user code location rather than only the internal framework stack. Typical error_details fields:
  • error_type
  • message
  • line / column
  • snippet
  • pointer
  • summary
  • user_traceback
Example:
repl = PyRepl()
result = await repl.execute(code="for i in range(2)\n    print(i)")
if not result["success"]:
    print(result["error"])
    print(result["error_details"])

reset_repl

Reset the REPL state and clear all variables.
Model-facing tool descriptions make it explicit that reset_repl clears REPL variables but keeps the registered runtime backend.
result = await repl.reset()
# Returns: "REPL reset successfully. All variables have been cleared."

Streaming Events

When enable_event=True, execute_code can emit these realtime events:
Event nameData shapeDescription
kernel_stdout{text: str}Standard output
kernel_stderr{text: str}Standard error
kernel_input_request{request_id: str, prompt: str, idle_timeout_seconds: float}An input() request waiting for user input

Consuming streaming events

from SimpleLLMFunc.hooks import CustomEvent, is_event_yield

async for output in llm_chat_function(message):
    if is_event_yield(output):
        event = output.event
        if isinstance(event, CustomEvent):
            if event.event_name == "kernel_stdout":
                print(f"[stdout] {event.data['text']}", end="")
            elif event.event_name == "kernel_stderr":
                print(f"[stderr] {event.data['text']}", end="")
If the event name is kernel_input_request, you can reply by calling PyRepl.submit_input(request_id, value). When you consume event streams from @llm_chat(enable_event=True), output.origin can also be used to distinguish the main chain from forked sub-chains.

Usage Example

Data analysis assistant

import sys

from SimpleLLMFunc import llm_chat
from SimpleLLMFunc.builtin import PyRepl
from SimpleLLMFunc.hooks import CustomEvent, is_event_yield

repl = PyRepl()


@llm_chat(
    llm_interface=llm,
    toolkit=repl.toolset,
    enable_event=True,
)
async def data_helper(message: str, history=None):
    """You are a data analysis assistant. Use Python code in small steps and print useful results."""


async for output in data_helper("Create 100 random numbers and compute the mean and standard deviation"):
    if is_event_yield(output):
        event = output.event
        if isinstance(event, CustomEvent) and event.event_name == "kernel_stdout":
            print(event.data["text"], end="")

Persistent programming context

repl = PyRepl()

result1 = await repl.execute(code="""
import random
data = [random.randint(1, 100) for _ in range(10)]
print(f"Generated {len(data)} random values")
print(f"Data: {data}")
""")

result2 = await repl.execute(code="""
mean = sum(data) / len(data)
print(f"Mean: {mean}")
""")

print(result2["stdout"])

Configuration Options

# Default execution timeout is 600 seconds
repl = PyRepl()

# Override execution timeout
repl = PyRepl(execution_timeout_seconds=180)

# Override timeout per call
result = await repl.execute("import time\ntime.sleep(2)", timeout_seconds=5)

# Configure input idle timeout
repl = PyRepl(input_idle_timeout_seconds=300)

# Set working directory
repl = PyRepl(working_directory="./sandbox")

Runtime and Self-Reference

PyRepl installs the built-in selfref pack by default.
from SimpleLLMFunc import llm_chat
from SimpleLLMFunc.builtin import PyRepl, SelfReference

repl = PyRepl()
self_reference = repl.get_runtime_backend("selfref")
assert isinstance(self_reference, SelfReference)


@llm_chat(
    llm_interface=llm,
    toolkit=repl.toolset,
    self_reference_key="agent_main",
)
async def agent(message: str, history=None):
    ...
For custom runtime primitive design, see Runtime Primitives. Useful selfref primitives:
  • runtime.selfref.context.inspect()
  • runtime.selfref.context.remember(...)
  • runtime.selfref.context.forget(...)
  • runtime.selfref.context.compact(...)
  • runtime.selfref.fork.spawn(...)
  • runtime.selfref.fork.gather_all(...)
Key fork rules:
  • Child forks inherit the pre-fork context snapshot, not the parent’s pending fork tool-call scene.
  • runtime.selfref.fork.gather_all(...) returns dict[fork_id -> ForkResult].
  • Compact results expose status, response, result, memory_key, history_count, and history_included.
  • Check status first, then read response or result. Use include_history=True only when full child history is actually needed.

Best Practices

  • Use different PyRepl instances for different task scopes
  • Keep execution snippets small and inspect output between steps
  • Use event streaming for better UX in TUI or custom interfaces
  • Reset the REPL when you want a clean state without removing runtime backends