> ## Documentation Index
> Fetch the complete documentation index at: https://simplellmfunc.cn/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM Interface

> OpenAICompatible, OpenAIResponsesCompatible, key pools, and rate limiting

# LLM Interface

The interface layer handles model communication, key rotation, and rate limiting.

## OpenAICompatible

Works with any provider that implements the OpenAI Chat Completions API:

```python theme={null}
from SimpleLLMFunc import OpenAICompatible

# From provider.json
models = OpenAICompatible.load_from_json_file("provider.json")
llm = models["openrouter"]["openai/gpt-4o"]

# Direct construction
from SimpleLLMFunc import APIKeyPool

llm = OpenAICompatible(
    api_key_pool=APIKeyPool(api_keys=["sk-key"], provider_id="openai"),
    model_name="gpt-4o",
    base_url="https://api.openai.com/v1",
    max_retries=3,
    retry_delay=1.0,
    rate_limit_capacity=20,
    rate_limit_refill_rate=3.0,
    context_window=128_000,
)
```

Compatible with: OpenAI, OpenRouter, Together, Groq, local vLLM, Ollama, etc.

## OpenAIResponsesCompatible

For providers implementing OpenAI's Responses API:

```python theme={null}
from SimpleLLMFunc import OpenAIResponsesCompatible

llm = OpenAIResponsesCompatible(
    api_key_pool=APIKeyPool(api_keys=["sk-key"], provider_id="openai"),
    model_name="gpt-4o",
    base_url="https://api.openai.com/v1",
)
```

Differences from OpenAICompatible:

* Maps system prompts to `instructions` field
* Handles Responses-specific streaming events
* Supports `reasoning={...}` kwargs for reasoning effort
* Different wire format for tool calls

From your decorator code, both adapters look the same. The wire-format differences are handled internally.

## APIKeyPool

Manages multiple keys with round-robin rotation:

```python theme={null}
from SimpleLLMFunc import APIKeyPool

pool = APIKeyPool(
    api_keys=["sk-key-1", "sk-key-2", "sk-key-3"],
    provider_id="openrouter-gpt4",
)
```

When a key hits rate limits, the pool rotates to the next. Put your highest-rate keys first.

## Rate Limiting

Built-in token bucket rate limiter:

```python theme={null}
# Configured via constructor
llm = OpenAICompatible(
    ...,
    rate_limit_capacity=20,       # Max concurrent "tokens" in the bucket
    rate_limit_refill_rate=3.0,   # Tokens added per second
)

# Check status
status = llm.get_rate_limit_status()
# {"available": 15, "capacity": 20, "refill_rate": 3.0}

# Reset after rate limit errors
llm.reset_rate_limit()
```

The rate limiter is per-instance. Multiple `OpenAICompatible` instances for the same model can have different rate limits.

## Passing LLM kwargs

Extra parameters are forwarded to the provider:

```python theme={null}
@llm_chat(
    llm_interface=llm,
    temperature=0.7,
    max_tokens=4096,
    top_p=0.9,
)
async def agent(message: str, history: list | None = None):
    """My agent."""
    pass
```

For OpenAIResponsesCompatible, you can pass reasoning effort:

```python theme={null}
@llm_chat(
    llm_interface=llm,
    reasoning_effort="high",
)
async def reasoning_agent(message: str, history: list | None = None):
    """An agent that reasons deeply."""
    pass
```

## Context Window

Set `context_window` to enable framework features that depend on knowing the model's capacity:

```python theme={null}
llm = OpenAICompatible(
    ...,
    context_window=128_000,  # GPT-4o's context window
)
```

Used by: auto-compaction threshold calculations, token usage tracking.

Default: 200,000 tokens.

→ [API Reference: Interfaces](/api/interfaces)
