OpenAICompatible and OpenAIResponsesCompatible adapters, key pools, and flow control.
What will you use?
LLM_Interface
Abstract base class that uniformly defines
chat() and chat_stream().OpenAICompatible
The default implementation for integrating with OpenAI-compatible interfaces, suitable as the entry point for most projects.
OpenAIResponsesCompatible
An adapter for OpenAI Responses API endpoints while keeping the same decorator-facing usage model.
APIKeyPool
Responsible for load balancing and task allocation of multiple API Keys.
TokenBucket
Responsible for request rate control to avoid hitting backend rate limits.
If you only want to quickly integrate the model, prefer using
OpenAICompatible.load_from_json_file(...) or OpenAIResponsesCompatible.load_from_json_file(...). Only when you need programmatic control over the key pool or rate limiting parameters should you manually instantiate the interface object.Simple rule: use
OpenAICompatible for normal chat/completions-compatible endpoints, and OpenAIResponsesCompatible for OpenAI Responses API endpoints. Both use the same provider.json structure.Quick Start
Component Description
LLM_Interface abstract base class
LLM_Interface abstract base class
LLM_Interface is the abstract base class for all LLM implementations, defining a unified interface specification. Its goal is to converge the calling method and return format to a consistent interface. Both
OpenAICompatible and OpenAIResponsesCompatible inherit from it.Core Features:- Standardized Interface: Unified
chat()andchat_stream() - Type Safety: Used with Python Type Annotations
- Async Native: Suitable for High Concurrency and Event Stream Scenarios
- Extensible: Easy to integrate new OpenAI-compatible services
OpenAICompatible Default Implementation
OpenAICompatible Default Implementation
OpenAICompatible is the default implementation of LLM_Interface, supporting any service compatible with the OpenAI API, including:
- OpenAI
- Deepseek
- Anthropic Claude Compatible Entry
- Volcano Engine Ark
- Baidu Qianfan
- Ollama, vLLM and other local model services
- Other OpenAI-compatible providers
- Automatic Retry
- Token Statistics
- Rate Limiting Control
- Multi-Key Rotation
OpenAIResponsesCompatible Adapter
OpenAIResponsesCompatible Adapter
OpenAIResponsesCompatible integrates with OpenAI Responses API endpoints while keeping the same @llm_function / @llm_chat surface for application code.Key points:- It uses the same
provider.jsonstructure asOpenAICompatible - Its constructor still takes
APIKeyPool,model_name, andbase_url - Decorator code still builds normal system/user messages first, then the adapter maps the selected system prompt to Responses
instructions - Responses-specific features such as
reasoning={...}are forwarded by the adapter without forcing app code to speak raw Responses schema
APIKeyPool Key Management
APIKeyPool Key Management
APIKeyPool uses a min heap to maintain the current load of each key, preferentially assigning the most idle key.You will get:- Automatic Load Balancing
- Concurrent State Tracking
- Thread Safety Under Lock Protection
- Share state by
provider_id
TokenBucket Flow Control
TokenBucket Flow Control
TokenBucket uses the classic token bucket algorithm to control the request rate.Algorithm Key Points:
- Refill tokens at a fixed rate
- Token consumption per request
- Bucket capacity is limited, excess will be discarded
- Allow burst requests when there are sufficient tokens in the bucket
How to Create Interface Instances
- OpenAICompatible: load from config
- OpenAICompatible: create manually
- OpenAIResponsesCompatible: load from config
- OpenAIResponsesCompatible: create manually
APIKeyPool Usage Example
TokenBucket Parameter Suggestions
| Parameter | Type | Explanation | Recommended Range |
|---|---|---|---|
| Capacity | The text parameter must be a string. The provided value has an invalid type: int. | Token bucket capacity | 10-50 |
| refill_rate | The text parameter must be a string. | tokens per second | 0.5-5.0 |
Production Mode
Multi-model Collaboration
Multi-model Collaboration
Failback and Retry
Failback and Retry
Monitoring and Debugging
Monitoring and Debugging
Best Practices
Key Management
Key Management
- Use different key sets for different environments
- Set more conservative retry and rate limiting parameters separately for high-cost models
- Do not let multiple environments share the same high-frequency production key
Traffic Control
Traffic Control
- Set
capacityandrefill_rateaccording to the vendor rate limit - For local models, a higher value can be set appropriately
- When rate limiting occurs, prioritize adjusting parameters first, then consider increasing the number of keys
Error Handling
Error Handling
Troubleshooting
Rate limit exceeded
Rate limit exceeded
- Increase
rate_limit_capacityorrate_limit_refill_rate - Check if configuration matches vendor restrictions
Key continuous failure
Key continuous failure
- Check if the API Key is valid and has remaining quota
- Check if
base_urlis correct
Token statistics inaccurate
Token statistics inaccurate
- Some vendors do not return complete token statistics
- The framework will try to estimate, but it cannot always be completely accurate.
debug log
debug log
Summary
The interface layer of SimpleLLMFunc centralizes model access, rate limiting, and key management into a unified abstraction:- LLM_Interface Unified Interface
- OpenAICompatible provides default implementation
- APIKeyPool Multi-Key Load Balancing
- TokenBucket Rate Control