AI Services (Legacy)

Legacy AI services providing access to various language models including OpenAI, Claude, Mistral, Llama, and DeepSeek.

Base URL: api.koveh.com/ai/

Endpoints

Method	Endpoint	Description
GET	`/health`	Service health check
POST	`/openai/chat`	OpenAI chat completion
POST	`/openai/embeddings`	OpenAI embeddings
POST	`/claude/chat`	Claude chat completion
POST	`/mistral/chat`	Mistral chat completion
POST	`/llama/chat`	Llama chat completion
POST	`/deepseek/chat`	DeepSeek chat completion
GET	`/models`	Get available models

Authentication

All endpoints require Bearer token authentication:

curl -H "Authorization: Bearer YOUR_API_KEY" \
  "api.koveh.com/ai/openai/chat"

OpenAI Chat Completion

Generate text completions using OpenAI models.

Endpoint: POST /openai/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "gpt-3.5-turbo",
  "max_tokens": 100,
  "temperature": 0.7,
  "stream": false
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "gpt-3.5-turbo"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-2). Default: 0.7
stream (boolean, optional): Whether to stream the response. Default: false

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/openai/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "gpt-3.5-turbo"
  }'

OpenAI Embeddings

Generate text embeddings using OpenAI models.

Endpoint: POST /openai/embeddings

Request Body

{
  "input": "Sample text for embedding",
  "model": "text-embedding-ada-002"
}

Parameters

input (string/array, required): Text or array of texts to embed
model (string, optional): Model to use. Default: "text-embedding-ada-002"

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.1, 0.2, 0.3, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "prompt_tokens": 3,
    "total_tokens": 3
  }
}

Example Request

curl -X POST "api.koveh.com/ai/openai/embeddings" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Sample text for embedding",
    "model": "text-embedding-ada-002"
  }'

Claude Chat Completion

Generate text completions using Claude models.

Endpoint: POST /claude/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "claude-3-sonnet-20240229",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "claude-3-sonnet-20240229"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "msg_123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "The capital of France is Paris."
    }
  ],
  "model": "claude-3-sonnet-20240229",
  "usage": {
    "input_tokens": 9,
    "output_tokens": 9
  }
}

Example Request

curl -X POST "api.koveh.com/ai/claude/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "claude-3-sonnet-20240229"
  }'

Mistral Chat Completion

Generate text completions using Mistral models.

Endpoint: POST /mistral/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "mistral-medium",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "mistral-medium"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "mistral-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "mistral-medium",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/mistral/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "mistral-medium"
  }'

Llama Chat Completion

Generate text completions using Llama models.

Endpoint: POST /llama/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "llama-2-7b-chat",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "llama-2-7b-chat"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "llama-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "llama-2-7b-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/llama/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "llama-2-7b-chat"
  }'

DeepSeek Chat Completion

Generate text completions using DeepSeek models.

Endpoint: POST /deepseek/chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "deepseek-chat",
  "max_tokens": 100,
  "temperature": 0.7
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "deepseek-chat"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-1). Default: 0.7

Response

{
  "id": "deepseek-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/ai/deepseek/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "deepseek-chat"
  }'

Available Models

Get list of available AI models.

Endpoint: GET /models

Response

{
  "models": {
    "openai": [
      {
        "id": "gpt-4",
        "name": "GPT-4",
        "description": "Most capable GPT model",
        "max_tokens": 8192
      },
      {
        "id": "gpt-3.5-turbo",
        "name": "GPT-3.5 Turbo",
        "description": "Fast and efficient model",
        "max_tokens": 4096
      }
    ],
    "claude": [
      {
        "id": "claude-3-opus-20240229",
        "name": "Claude 3 Opus",
        "description": "Most capable Claude model",
        "max_tokens": 4096
      },
      {
        "id": "claude-3-sonnet-20240229",
        "name": "Claude 3 Sonnet",
        "description": "Balanced performance model",
        "max_tokens": 4096
      }
    ],
    "mistral": [
      {
        "id": "mistral-large",
        "name": "Mistral Large",
        "description": "Most capable Mistral model",
        "max_tokens": 32768
      },
      {
        "id": "mistral-medium",
        "name": "Mistral Medium",
        "description": "Balanced performance model",
        "max_tokens": 32768
      }
    ],
    "llama": [
      {
        "id": "llama-2-70b-chat",
        "name": "Llama 2 70B Chat",
        "description": "Large Llama 2 model",
        "max_tokens": 4096
      },
      {
        "id": "llama-2-7b-chat",
        "name": "Llama 2 7B Chat",
        "description": "Smaller Llama 2 model",
        "max_tokens": 4096
      }
    ],
    "deepseek": [
      {
        "id": "deepseek-chat",
        "name": "DeepSeek Chat",
        "description": "DeepSeek chat model",
        "max_tokens": 32768
      }
    ]
  }
}

Example Request

curl -X GET "api.koveh.com/ai/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Health Check

Check service health status.

Endpoint: GET /health

Response

{
  "status": "healthy",
  "timestamp": "2025-08-30T09:19:31.245295",
  "providers": {
    "openai": "available",
    "claude": "available",
    "mistral": "available",
    "llama": "available",
    "deepseek": "available"
  }
}

Example Request

curl -X GET "api.koveh.com/ai/health"

Integration Examples

Python Example - Multi-Provider

import requests

def chat_with_provider(provider, messages, model=None):
    response = requests.post(
        f"http://api.koveh.com/ai/{provider}/chat",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "messages": messages,
            "model": model
        }
    )
    return response.json()

# Try different providers
providers = ["openai", "claude", "mistral", "llama", "deepseek"]
messages = [{"role": "user", "content": "What is the capital of France?"}]

for provider in providers:
    try:
        result = chat_with_provider(provider, messages)
        print(f"{provider}: {result['choices'][0]['message']['content']}")
    except Exception as e:
        print(f"{provider}: Error - {e}")

JavaScript Example - OpenAI

async function chatWithOpenAI(messages, model = 'gpt-3.5-turbo') {
    const response = await fetch('http://api.koveh.com/ai/openai/chat', {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY',
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            messages: messages,
            model: model
        })
    });
    return await response.json();
}

// Use the function
const messages = [
    {role: 'user', content: 'What is the capital of France?'}
];

chatWithOpenAI(messages)
    .then(result => console.log(result.choices[0].message.content));

Embeddings Example

import requests

def get_embeddings(text, model="text-embedding-ada-002"):
    response = requests.post(
        "http://api.koveh.com/ai/openai/embeddings",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "input": text,
            "model": model
        }
    )
    return response.json()

# Get embeddings
embeddings = get_embeddings("Sample text for embedding")
print(f"Embedding dimensions: {len(embeddings['data'][0]['embedding'])}")

Error Handling

The API returns standard error responses:

{
  "error": "Invalid model specified",
  "status_code": 400,
  "timestamp": "2025-08-30T09:19:31.245295"
}

Common error codes:

400: Bad Request (invalid parameters, model not found)
401: Unauthorized (missing or invalid API key)
404: Not Found (invalid endpoint)
429: Too Many Requests (rate limit exceeded)
500: Internal Server Error (provider API error)

Rate Limiting

Each provider has its own rate limiting:

OpenAI: 50 requests per minute
Claude: 30 requests per minute
Mistral: 40 requests per minute
Llama: 60 requests per minute
DeepSeek: 50 requests per minute

Best Practices

Model Selection: Choose the appropriate model for your use case
Token Limits: Be aware of model-specific token limits
Error Handling: Implement proper error handling for each provider
Rate Limiting: Respect rate limits and implement backoff strategies
Cost Optimization: Use smaller models for simple tasks
Fallback Strategy: Have fallback providers in case one is unavailable

Use Cases

Content Generation: Generate articles, blog posts, and creative content
Code Generation: Generate and explain code snippets
Language Translation: Translate text between languages
Question Answering: Answer questions based on context
Text Summarization: Summarize long documents
Sentiment Analysis: Analyze sentiment in text
Text Classification: Classify text into categories

AI Services (Legacy)

On this page