Documentation
Anannas is an OpenAI-compatible API gateway. Use any OpenAI SDK — just change the base URL.
Quick Start
Install the OpenAI SDK and point it at Anannas:
from openai import OpenAI client = OpenAI( base_url="https://api.anannas.ai/v1", api_key="an-sk-..." ) response = client.chat.completions.create( model="claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello"}] ) print(response.choices[0].message.content)
That's it. Same code works for GPT, Gemini, DeepSeek — just change the model name.
curl
curl https://api.anannas.ai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer an-sk-..." \ -d '{ "model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}] }'
Node.js / TypeScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.anannas.ai/v1", apiKey: "an-sk-...", }); const res = await client.chat.completions.create({ model: "gemini-2.5-flash", messages: [{ role: "user", content: "Hello" }], });
Authentication
Include your API key in the Authorization header:
Authorization: Bearer an-sk-...
API keys start with an-sk-. Get one from your dashboard after signing up.
Base URL
https://api.anannas.ai/v1
Staging: https://anannas-staging.upsurge.workers.dev
Chat Completions
POST /v1/chat/completions
Creates a chat completion. Compatible with the OpenAI chat completions API.
Parameters
| Parameter | Type | Description |
|---|---|---|
| model | string | required Model ID (e.g. claude-sonnet-4-20250514) |
| messages | array | required Array of message objects with role and content |
| max_tokens | integer | Maximum tokens to generate. Required for Anthropic models. |
| temperature | float | Sampling temperature (0–2). Not supported on reasoning models. |
| top_p | float | Nucleus sampling (0–1). |
| stream | boolean | Stream response as SSE events. |
| stop | string | array | Stop sequences. Up to 4. |
| tools | array | Tool/function definitions. Translated across all providers. |
| tool_choice | string | object | "auto", "none", "required", or specific function. |
| response_format | object | Force JSON output: {"type": "json_object"} |
| reasoning_effort | string | "low", "medium", "high" — for OpenAI o-series. Auto-translated to Anthropic thinking. |
| thinking | object | Anthropic extended thinking: {"type": "enabled", "budget_tokens": 4096} |
| stream_options | object | {"include_usage": true} to get token counts in stream. |
| seed | integer | Deterministic generation (OpenAI, Gemini). |
| top_k | integer | Top-K sampling (Anthropic, Gemini only). |
| presence_penalty | float | -2.0 to 2.0. Not supported on Anthropic. |
| frequency_penalty | float | -2.0 to 2.0. Not supported on Anthropic. |
| user | string | End-user ID for tracking. Mapped to metadata.user_id for Anthropic. |
Streaming
Set "stream": true. Response comes as SSE events in OpenAI format, regardless of which provider you're using.
data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}
data: [DONE]
All provider streaming formats (Anthropic events, Gemini SSE, DeepSeek) are normalized to this format.
Tool Calling
Send tools in OpenAI format. Anannas translates to each provider's native format.
{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "What's the weather?"}],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
}
}
}
}]
}
Tool results: send back as a tool role message with tool_call_id.
Thinking / Reasoning
Anthropic extended thinking
{
"model": "claude-sonnet-4-20250514",
"thinking": {"type": "enabled", "budget_tokens": 4096},
"max_tokens": 8192,
"messages": [...]
}
Response includes _thinking field with the model's reasoning.
OpenAI reasoning models
{
"model": "o4-mini",
"reasoning_effort": "medium",
"messages": [...]
}
Cross-provider translation
If you send reasoning_effort to an Anthropic model, Anannas auto-translates to thinking.budget_tokens. If you send thinking to an OpenAI model, it maps to reasoning_effort.
List Models
GET /v1/models
Returns all available models in OpenAI format.
GET /v1/models/:id
Returns a single model by ID.
Errors
Errors follow the OpenAI error format:
{
"error": {
"message": "Model 'foo' not found.",
"type": "not_found_error",
"code": "model_not_found",
"param": "model"
}
}
| Status | Code | Meaning |
|---|---|---|
| 400 | invalid_request_error | Missing or invalid parameters |
| 401 | authentication_error | Invalid or missing API key |
| 404 | not_found_error | Model not found |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | api_error | Internal error |
| 502 | proxy_error | Provider unreachable |
Provider Notes
Anthropic
max_tokensis required (defaults to 4096 if omitted).- System messages are extracted and sent as Anthropic's top-level
systemfield. presence_penaltyandfrequency_penaltyare not supported — silently dropped.- Extended thinking requires
temperatureto be 1 — Anannas handles this automatically.
OpenAI
- Reasoning models (o3, o4-mini) use
max_completion_tokensinstead ofmax_tokens. Anannas handles the mapping. - Reasoning models don't support
temperature— silently dropped.
Google Gemini
- All params translated to Gemini's
generationConfigformat automatically. - Role mapping:
assistant→model. - Gemini 2.5 Flash is a thinking model — low
max_tokensmay produce empty output as reasoning consumes the budget.
DeepSeek
deepseek-reasonerreturns reasoning in_reasoning_contentfield.- Does not support
logprobs,presence_penalty,frequency_penalty.