Koveh API

AI Services (Legacy)

OpenAI, Claude, Mistral, Llama, DeepSeek APIs

AI Services (Legacy)

Legacy AI services providing access to various language models including OpenAI, Claude, Mistral, Llama, and DeepSeek.

Base URL: api.koveh.com/ai/

Endpoints

MethodEndpointDescription
GET/healthService health check
POST/openai/chatOpenAI chat completion
POST/openai/embeddingsOpenAI embeddings
POST/claude/chatClaude chat completion
POST/mistral/chatMistral chat completion
POST/llama/chatLlama chat completion
POST/deepseek/chatDeepSeek chat completion
GET/modelsGet available models

Authentication

All endpoints require Bearer token authentication:

curl -H "Authorization: Bearer YOUR_API_KEY" \
  "api.koveh.com/ai/openai/chat"

OpenAI Chat Completion

Generate text completions using OpenAI models.

Endpoint: POST /openai/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "gpt-3.5-turbo",
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}

Parameters

  • messages (array, required): Array of message objects with role and content
  • model (string, optional): Model to use. Default: "gpt-3.5-turbo"
  • max_tokens (number, optional): Maximum tokens to generate. Default: 100
  • temperature (number, optional): Sampling temperature (0-2). Default: 0.7
  • stream (boolean, optional): Whether to stream the response. Default: false

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/openai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "gpt-3.5-turbo"
  }'

OpenAI Embeddings

Generate text embeddings using OpenAI models.

Endpoint: POST /openai/embeddings

Request Body

{
  "input": "Sample text for embedding",
  "model": "text-embedding-ada-002"
}

Parameters

  • input (string/array, required): Text or array of texts to embed
  • model (string, optional): Model to use. Default: "text-embedding-ada-002"

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.1, 0.2, 0.3, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "prompt_tokens": 3,
    "total_tokens": 3
  }
}

Example Request

curl -X POST "api.koveh.com/ai/openai/embeddings" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Sample text for embedding",
    "model": "text-embedding-ada-002"
  }'

Claude Chat Completion

Generate text completions using Claude models.

Endpoint: POST /claude/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "claude-3-sonnet-20240229",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

  • messages (array, required): Array of message objects with role and content
  • model (string, optional): Model to use. Default: "claude-3-sonnet-20240229"
  • max_tokens (number, optional): Maximum tokens to generate. Default: 100
  • temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "claude-3-sonnet-20240229",
  "usage": {
    "input_tokens": 9,
    "output_tokens": 9
  }
}

Example Request

curl -X POST "api.koveh.com/ai/claude/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "claude-3-sonnet-20240229"
  }'

Mistral Chat Completion

Generate text completions using Mistral models.

Endpoint: POST /mistral/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "mistral-medium",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

  • messages (array, required): Array of message objects with role and content
  • model (string, optional): Model to use. Default: "mistral-medium"
  • max_tokens (number, optional): Maximum tokens to generate. Default: 100
  • temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "mistral-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "mistral-medium",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/mistral/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "mistral-medium"
  }'

Llama Chat Completion

Generate text completions using Llama models.

Endpoint: POST /llama/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "llama-2-7b-chat",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

  • messages (array, required): Array of message objects with role and content
  • model (string, optional): Model to use. Default: "llama-2-7b-chat"
  • max_tokens (number, optional): Maximum tokens to generate. Default: 100
  • temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "llama-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "llama-2-7b-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/llama/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "llama-2-7b-chat"
  }'

DeepSeek Chat Completion

Generate text completions using DeepSeek models.

Endpoint: POST /deepseek/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "deepseek-chat",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

  • messages (array, required): Array of message objects with role and content
  • model (string, optional): Model to use. Default: "deepseek-chat"
  • max_tokens (number, optional): Maximum tokens to generate. Default: 100
  • temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "deepseek-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/deepseek/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "deepseek-chat"
  }'

Available Models

Get list of available AI models.

Endpoint: GET /models

Response

{
  "models": {
    "openai": [
      {
        "id": "gpt-4",
        "name": "GPT-4",
        "description": "Most capable GPT model",
        "max_tokens": 8192
      },
      {
        "id": "gpt-3.5-turbo",
        "name": "GPT-3.5 Turbo",
        "description": "Fast and efficient model",
        "max_tokens": 4096
      }
    ],
    "claude": [
      {
        "id": "claude-3-opus-20240229",
        "name": "Claude 3 Opus",
        "description": "Most capable Claude model",
        "max_tokens": 4096
      },
      {
        "id": "claude-3-sonnet-20240229",
        "name": "Claude 3 Sonnet",
        "description": "Balanced performance model",
        "max_tokens": 4096
      }
    ],
    "mistral": [
      {
        "id": "mistral-large",
        "name": "Mistral Large",
        "description": "Most capable Mistral model",
        "max_tokens": 32768
      },
      {
        "id": "mistral-medium",
        "name": "Mistral Medium",
        "description": "Balanced performance model",
        "max_tokens": 32768
      }
    ],
    "llama": [
      {
        "id": "llama-2-70b-chat",
        "name": "Llama 2 70B Chat",
        "description": "Large Llama 2 model",
        "max_tokens": 4096
      },
      {
        "id": "llama-2-7b-chat",
        "name": "Llama 2 7B Chat",
        "description": "Smaller Llama 2 model",
        "max_tokens": 4096
      }
    ],
    "deepseek": [
      {
        "id": "deepseek-chat",
        "name": "DeepSeek Chat",
        "description": "DeepSeek chat model",
        "max_tokens": 32768
      }
    ]
  }
}

Example Request

curl -X GET "api.koveh.com/ai/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Health Check

Check service health status.

Endpoint: GET /health

Response

{
  "status": "healthy",
  "timestamp": "2025-08-30T09:19:31.245295",
  "providers": {
    "openai": "available",
    "claude": "available",
    "mistral": "available",
    "llama": "available",
    "deepseek": "available"
  }
}

Example Request

curl -X GET "api.koveh.com/ai/health"

Integration Examples

Python Example - Multi-Provider

import requests

def chat_with_provider(provider, messages, model=None):
    response = requests.post(
        f"http://api.koveh.com/ai/{provider}/chat",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "messages": messages,
            "model": model
        }
    )
    return response.json()

# Try different providers
providers = ["openai", "claude", "mistral", "llama", "deepseek"]
messages = [{"role": "user", "content": "What is the capital of France?"}]

for provider in providers:
    try:
        result = chat_with_provider(provider, messages)
        print(f"{provider}: {result['choices'][0]['message']['content']}")
    except Exception as e:
        print(f"{provider}: Error - {e}")

JavaScript Example - OpenAI

async function chatWithOpenAI(messages, model = 'gpt-3.5-turbo') {
    const response = await fetch('http://api.koveh.com/ai/openai/chat', {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY',
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            messages: messages,
            model: model
        })
    });
    return await response.json();
}

// Use the function
const messages = [
    {role: 'user', content: 'What is the capital of France?'}
];

chatWithOpenAI(messages)
    .then(result => console.log(result.choices[0].message.content));

Embeddings Example

import requests

def get_embeddings(text, model="text-embedding-ada-002"):
    response = requests.post(
        "http://api.koveh.com/ai/openai/embeddings",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "input": text,
            "model": model
        }
    )
    return response.json()

# Get embeddings
embeddings = get_embeddings("Sample text for embedding")
print(f"Embedding dimensions: {len(embeddings['data'][0]['embedding'])}")

Error Handling

The API returns standard error responses:

{
  "error": "Invalid model specified",
  "status_code": 400,
  "timestamp": "2025-08-30T09:19:31.245295"
}

Common error codes:

  • 400: Bad Request (invalid parameters, model not found)
  • 401: Unauthorized (missing or invalid API key)
  • 404: Not Found (invalid endpoint)
  • 429: Too Many Requests (rate limit exceeded)
  • 500: Internal Server Error (provider API error)

Rate Limiting

Each provider has its own rate limiting:

  • OpenAI: 50 requests per minute
  • Claude: 30 requests per minute
  • Mistral: 40 requests per minute
  • Llama: 60 requests per minute
  • DeepSeek: 50 requests per minute

Best Practices

  1. Model Selection: Choose the appropriate model for your use case
  2. Token Limits: Be aware of model-specific token limits
  3. Error Handling: Implement proper error handling for each provider
  4. Rate Limiting: Respect rate limits and implement backoff strategies
  5. Cost Optimization: Use smaller models for simple tasks
  6. Fallback Strategy: Have fallback providers in case one is unavailable

Use Cases

  • Content Generation: Generate articles, blog posts, and creative content
  • Code Generation: Generate and explain code snippets
  • Language Translation: Translate text between languages
  • Question Answering: Answer questions based on context
  • Text Summarization: Summarize long documents
  • Sentiment Analysis: Analyze sentiment in text
  • Text Classification: Classify text into categories