LLM Interface
The interface layer handles model communication, key rotation, and rate limiting.OpenAICompatible
Works with any provider that implements the OpenAI Chat Completions API:OpenAIResponsesCompatible
For providers implementing OpenAI’s Responses API:- Maps system prompts to
instructionsfield - Handles Responses-specific streaming events
- Supports
reasoning={...}kwargs for reasoning effort - Different wire format for tool calls
APIKeyPool
Manages multiple keys with round-robin rotation:Rate Limiting
Built-in token bucket rate limiter:OpenAICompatible instances for the same model can have different rate limits.
Passing LLM kwargs
Extra parameters are forwarded to the provider:Context Window
Setcontext_window to enable framework features that depend on knowing the model’s capacity: