Documentation

Anannas is an OpenAI-compatible API gateway. Use any OpenAI SDK — just change the base URL.

Quick Start

Install the OpenAI SDK and point it at Anannas:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.anannas.ai/v1",
    api_key="an-sk-..."
)

response = client.chat.completions.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello"}]
)

print(response.choices[0].message.content)

That's it. Same code works for GPT, Gemini, DeepSeek — just change the model name.

curl

curl https://api.anannas.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer an-sk-..." \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Node.js / TypeScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.anannas.ai/v1",
  apiKey: "an-sk-...",
});

const res = await client.chat.completions.create({
  model: "gemini-2.5-flash",
  messages: [{ role: "user", content: "Hello" }],
});

Authentication

Include your API key in the Authorization header:

Authorization: Bearer an-sk-...

API keys start with an-sk-. Get one from your dashboard after signing up.

Base URL

https://api.anannas.ai/v1

Staging: https://anannas-staging.upsurge.workers.dev

Chat Completions

POST /v1/chat/completions

Creates a chat completion. Compatible with the OpenAI chat completions API.

Parameters

Parameter	Type	Description
model	string	required Model ID (e.g. `claude-sonnet-4-20250514`)
messages	array	required Array of message objects with `role` and `content`
max_tokens	integer	Maximum tokens to generate. Required for Anthropic models.
temperature	float	Sampling temperature (0–2). Not supported on reasoning models.
top_p	float	Nucleus sampling (0–1).
stream	boolean	Stream response as SSE events.
stop	string \| array	Stop sequences. Up to 4.
tools	array	Tool/function definitions. Translated across all providers.
tool_choice	string \| object	`"auto"`, `"none"`, `"required"`, or specific function.
response_format	object	Force JSON output: `{"type": "json_object"}`
reasoning_effort	string	`"low"`, `"medium"`, `"high"` — for OpenAI o-series. Auto-translated to Anthropic thinking.
thinking	object	Anthropic extended thinking: `{"type": "enabled", "budget_tokens": 4096}`
stream_options	object	`{"include_usage": true}` to get token counts in stream.
seed	integer	Deterministic generation (OpenAI, Gemini).
top_k	integer	Top-K sampling (Anthropic, Gemini only).
presence_penalty	float	-2.0 to 2.0. Not supported on Anthropic.
frequency_penalty	float	-2.0 to 2.0. Not supported on Anthropic.
user	string	End-user ID for tracking. Mapped to `metadata.user_id` for Anthropic.

Streaming

Set "stream": true. Response comes as SSE events in OpenAI format, regardless of which provider you're using.

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: [DONE]

All provider streaming formats (Anthropic events, Gemini SSE, DeepSeek) are normalized to this format.

Tool Calling

Send tools in OpenAI format. Anannas translates to each provider's native format.

{
  "model": "claude-sonnet-4-20250514",
  "messages": [{"role": "user", "content": "What's the weather?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {"type": "string"}
        }
      }
    }
  }]
}

Tool results: send back as a tool role message with tool_call_id.

Thinking / Reasoning

Anthropic extended thinking

{
  "model": "claude-sonnet-4-20250514",
  "thinking": {"type": "enabled", "budget_tokens": 4096},
  "max_tokens": 8192,
  "messages": [...]
}

Response includes _thinking field with the model's reasoning.

OpenAI reasoning models

{
  "model": "o4-mini",
  "reasoning_effort": "medium",
  "messages": [...]
}

Cross-provider translation

If you send reasoning_effort to an Anthropic model, Anannas auto-translates to thinking.budget_tokens. If you send thinking to an OpenAI model, it maps to reasoning_effort.

List Models

GET /v1/models

Returns all available models in OpenAI format.

GET /v1/models/:id

Returns a single model by ID.

Errors

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Model 'foo' not found.",
    "type": "not_found_error",
    "code": "model_not_found",
    "param": "model"
  }
}

Status	Code	Meaning
400	invalid_request_error	Missing or invalid parameters
401	authentication_error	Invalid or missing API key
404	not_found_error	Model not found
429	rate_limit_error	Rate limit exceeded
500	api_error	Internal error
502	proxy_error	Provider unreachable

Provider Notes

Anthropic

max_tokens is required (defaults to 4096 if omitted).
System messages are extracted and sent as Anthropic's top-level system field.
presence_penalty and frequency_penalty are not supported — silently dropped.
Extended thinking requires temperature to be 1 — Anannas handles this automatically.

OpenAI

Reasoning models (o3, o4-mini) use max_completion_tokens instead of max_tokens. Anannas handles the mapping.
Reasoning models don't support temperature — silently dropped.

Google Gemini

All params translated to Gemini's generationConfig format automatically.
Role mapping: assistant → model.
Gemini 2.5 Flash is a thinking model — low max_tokens may produce empty output as reasoning consumes the budget.

DeepSeek

deepseek-reasoner returns reasoning in _reasoning_content field.
Does not support logprobs, presence_penalty, frequency_penalty.