AI Service V1
OpenAI-compatible gateway for GPT-5.4, Claude, Mistral, Llama, and DeepSeek with usage tracking
AI Service V1 is an OpenAI-compatible proxy for major cloud LLM providers. You send chat completions in the standard messages format; we forward the request to the provider, log token usage, and deduct cost from your API balance.
Base URL. https://api.koveh.com/ai/
Provider-specific paths are listed below. The legacy unified path https://api.koveh.com/ai/v1/ still works and routes to the OpenAI implementation.
Authentication
All requests require the X-API-Key header (not Bearer). Request an API key via API Access & Authentication.
X-API-Key: YOUR_API_KEYHow Koveh uses these models
Across koveh.com and internal automations we default to the GPT-5.4 family on the OpenAI path unless a task needs another provider.
| Use case | Typical model | Why |
|---|---|---|
| High-volume classification, digests, lightweight chat | gpt-5.4-nano | Cheapest GPT-5.4-class tier; enough for routing, tagging, short summaries |
| General product chat, agents, coding helpers | gpt-5.4-mini | Default balance of quality, speed, and cost |
| Complex reasoning, premium bots, hard analysis | gpt-5.4 | Strongest GPT-5.4 tier when latency and cost are secondary |
| Long-context schema / SQL assistance (platform) | gpt-5-mini | Sampled table context with strict row/char caps |
| Mail classification (Mail Processor v2) | gpt-5.4-nano with gpt-5.4-mini fallback | Fast first pass; fallback on low confidence |
Legacy OpenAI ids such as gpt-4o and gpt-4o-mini still work on the wire, but we recommend passing gpt-5.4-* explicitly in new integrations. On koveh.com, older ids are remapped internally (for example gpt-4o → gpt-5.4-mini).
For self-hosted text generation and embeddings on our infrastructure, see Qwen3 AI API — it is a separate service, not part of AI Service V1.
Supported providers
Each provider has its own path prefix. The request body follows the OpenAI Chat Completions shape unless noted.
| Provider | Path | Recommended default | Also common |
|---|---|---|---|
| OpenAI | /ai/openai/ | gpt-5.4-mini | gpt-5.4-nano, gpt-5.4, gpt-4o (legacy) |
| Anthropic | /ai/claude/ | claude-3-5-sonnet-20241022 | claude-3-haiku-20240307 |
| Mistral | /ai/mistral/ | mistral-large-latest | mistral-nemo-instruct-2407 |
| Together (Llama) | /ai/llama/ | meta-llama/Llama-3.1-70b-instruct | meta-llama/Llama-3.3-70B-Instruct |
| DeepSeek | /ai/deepseek/ | deepseek-chat | deepseek-r1, deepseek-r1-0528 |
The gateway does not hard-limit model names: any id accepted by the upstream provider can be sent in the model field. Costs are calculated from actual token usage for that model.
Quick start
Non-streaming chat (OpenAI path, recommended model):
curl -s https://api.koveh.com/ai/openai/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"model": "gpt-5.4-mini",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Summarize what a data warehouse staging layer does in two sentences."}
],
"stream": false
}'Streaming — set "stream": true. The response is Server-Sent Events with data: {"content": "..."} chunks and a final data: [DONE].
Claude example — same JSON shape; use /ai/claude/chat/completions and a Claude model id:
curl -s https://api.koveh.com/ai/claude/chat/completions \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"model": "claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Hello!"}]
}'Endpoints
1. Chat completion (per provider)
POST <provider-path>/chat/completions
Examples:
POST https://api.koveh.com/ai/openai/chat/completionsPOST https://api.koveh.com/ai/claude/chat/completionsPOST https://api.koveh.com/ai/deepseek/chat/completions
Request body (OpenAI-compatible):
{
"model": "gpt-5.4-mini",
"messages": [
{"role": "system", "content": "Optional system prompt"},
{"role": "user", "content": "Hello!"}
],
"stream": false,
"max_tokens": 1024,
"temperature": 0.7
}| Field | Required | Notes |
|---|---|---|
model | yes | Provider model id (see tables above) |
messages | yes | Array of {role, content} — system, user, assistant |
stream | no | Default false; true for SSE streaming |
max_tokens | no | Cap on completion tokens |
temperature | no | Sampling temperature; provider defaults apply if omitted |
Additional OpenAI parameters supported by the upstream API (for example response_format, reasoning_effort on GPT-5.4-class models) are forwarded as-is when the provider accepts them.
Response. Standard OpenAI-style chat.completion object with choices, usage (prompt / completion / total tokens), and the resolved model name.
2. Unified completion (legacy)
POST https://api.koveh.com/ai/v1/chat/completions
Proxies to the OpenAI implementation for backward compatibility. Prefer /ai/openai/chat/completions for new code.
Choosing a model
OpenAI (most integrations).
gpt-5.4-nano— classification, extraction, short replies, high QPS.gpt-5.4-mini— default for chat, tools, and agents; best everyday choice.gpt-5.4— multi-step reasoning, difficult coding, or when mini quality is not enough.
Other providers.
- DeepSeek — cost-effective general chat (
deepseek-chat) and reasoning (deepseek-r1*). - Claude — long instructions and careful prose; Sonnet for balance, Haiku for speed.
- Llama (Together) — open-weight models without a direct OpenAI account.
- Mistral — European-hosted alternative with OpenAI-like chat API.
Model pricing and context windows are stored in our api_models catalog and applied when calculating balance deductions.
Cost tracking
Every completed request is logged with provider, model, token counts, duration, and USD cost. Charges are deducted from the balance on your API user account (api_users.usd_available).
- Failed requests are logged with
status=errorand typically incur no token charge. - Request/response payloads are stored for audit (
api_request_data, transaction tables). - For balance and access questions, contact
studio@koveh.com.
Related services
| Service | When to use |
|---|---|
| Qwen3 AI API | Self-hosted Qwen3 chat and embeddings on Koveh infrastructure |
| API Access & Authentication | How to obtain and use API keys |
| Koveh Services Overview | Full list of platform APIs |