Documentation Index
Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt
Use this file to discover all available pages before exploring further.
SelfRef Engineering
This page covers advanced SelfRef patterns for production agent architectures. For maintainers: 0.8.1 splits the former SelfReference god object into a facade plusstore, active_turn, mutations, memory_api, context_memory, agent_binding, fork_manager, and fork_utils. Application code should still interact through SelfReference, runtime.selfref.context.*, and runtime.selfref.fork.*; the split is an internal maintainability boundary.
Compaction Strategy
When to Compact
Compact when:- Token usage approaches context window limits
- A logical milestone is complete (task finished, phase transition)
- The working transcript contains mostly stale information (old tool outputs, superseded plans)
What Makes a Good Compact Call
discoveries= facts learned that may be needed latercompleted= what’s done (so the agent doesn’t redo it)current_status= where we stoppedlikely_next_work= what to do next (gives the resuming agent direction)remember= short durable lessons that should persist as experiences (use sparingly)
Compaction Lifecycle
compact(...)is called insideexecute_code- The compaction is queued (not applied immediately)
- After the current tool batch finishes, the runtime applies a compaction patch to the transcript
- System prompt + experiences are preserved
- Working transcript is replaced with the summary message
- The agent’s next turn starts with fresh, compact context
Auto-Compaction Pattern
Inject compaction instructions when approaching token limits:Fork Orchestration
Basic Fork Pattern
Fork Design Rules
- Children inherit pre-fork context — They see the parent’s conversation up to the moment before the fork tool call. They do NOT see the parent’s fork call itself.
- Children are independent — They cannot read or modify the parent’s context. Each child has its own isolated ReAct loop.
-
Gather blocks —
gather_all()waits until all spawned children complete. Don’t call it immediately after spawn unless you actually need results now. - Keep tasks bounded — Give each child a clear scope, acceptance criteria, and stop condition. Unbounded tasks waste tokens.
-
Ask for summaries — Tell children to return concise results + file paths, not full transcripts. Use
include_history=Trueonly when you need to inspect their reasoning.
Fork vs. Sequential
Use forks when:- Tasks are independent (no data dependencies between them)
- Tasks are substantial enough to justify the overhead of a child context
- Parallelism saves wall-clock time
- Tasks depend on each other’s outputs
- Tasks are small (one tool call each)
- You need intermediate results to decide next steps
Multi-Key Memory
SelfReference supports multiple memory keys for partitioned state:- Separate agent personas within one application
- Long-running project memory vs. ephemeral task context
- Shared reference context vs. per-user conversation
Experience Management
When to Remember
Good experiences:- Durable lessons (“this API requires auth header X”)
- User preferences (“prefers concise output”)
- Project conventions (“always use pytest, not unittest”)
- Transient state (“currently working on file X”) — put this in compact summary instead
- Large data — experiences live in system prompt, keep them short
- Temporary context — use working messages or compact checkpoint instead
Pruning Experiences
Production Pattern: Durable Agent
- Remembers across sessions (experiences persist in SelfReference backend)
- Compacts context at milestones (keeps context fresh)
- Delegates work to forks (parallelism for complex tasks)
- Has full file and code access (via toolsets)