AI Services (Legacy)
OpenAI, Claude, Mistral, Llama, DeepSeek APIs
AI Services (Legacy)
Legacy AI services providing access to various language models including OpenAI, Claude, Mistral, Llama, and DeepSeek.
Base URL: api.koveh.com/ai/
Endpoints
Method | Endpoint | Description |
---|---|---|
GET | /health | Service health check |
POST | /openai/chat | OpenAI chat completion |
POST | /openai/embeddings | OpenAI embeddings |
POST | /claude/chat | Claude chat completion |
POST | /mistral/chat | Mistral chat completion |
POST | /llama/chat | Llama chat completion |
POST | /deepseek/chat | DeepSeek chat completion |
GET | /models | Get available models |
Authentication
All endpoints require Bearer token authentication:
curl -H "Authorization: Bearer YOUR_API_KEY" \
"api.koveh.com/ai/openai/chat"
OpenAI Chat Completion
Generate text completions using OpenAI models.
Endpoint: POST /openai/chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "gpt-3.5-turbo",
"max_tokens": 100,
"temperature": 0.7,
"stream": false
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "gpt-3.5-turbo"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-2). Default: 0.7stream
(boolean, optional): Whether to stream the response. Default: false
Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-3.5-turbo",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
}
}
Example Request
curl -X POST "api.koveh.com/ai/openai/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "gpt-3.5-turbo"
}'
OpenAI Embeddings
Generate text embeddings using OpenAI models.
Endpoint: POST /openai/embeddings
Request Body
{
"input": "Sample text for embedding",
"model": "text-embedding-ada-002"
}
Parameters
input
(string/array, required): Text or array of texts to embedmodel
(string, optional): Model to use. Default: "text-embedding-ada-002"
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [0.1, 0.2, 0.3, ...],
"index": 0
}
],
"model": "text-embedding-ada-002",
"usage": {
"prompt_tokens": 3,
"total_tokens": 3
}
}
Example Request
curl -X POST "api.koveh.com/ai/openai/embeddings" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Sample text for embedding",
"model": "text-embedding-ada-002"
}'
Claude Chat Completion
Generate text completions using Claude models.
Endpoint: POST /claude/chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "claude-3-sonnet-20240229",
"max_tokens": 100,
"temperature": 0.7
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "claude-3-sonnet-20240229"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-1). Default: 0.7
Response
{
"id": "msg_123",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "The capital of France is Paris."
}
],
"model": "claude-3-sonnet-20240229",
"usage": {
"input_tokens": 9,
"output_tokens": 9
}
}
Example Request
curl -X POST "api.koveh.com/ai/claude/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "claude-3-sonnet-20240229"
}'
Mistral Chat Completion
Generate text completions using Mistral models.
Endpoint: POST /mistral/chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "mistral-medium",
"max_tokens": 100,
"temperature": 0.7
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "mistral-medium"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-1). Default: 0.7
Response
{
"id": "mistral-123",
"object": "chat.completion",
"created": 1677652288,
"model": "mistral-medium",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
}
}
Example Request
curl -X POST "api.koveh.com/ai/mistral/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "mistral-medium"
}'
Llama Chat Completion
Generate text completions using Llama models.
Endpoint: POST /llama/chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "llama-2-7b-chat",
"max_tokens": 100,
"temperature": 0.7
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "llama-2-7b-chat"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-1). Default: 0.7
Response
{
"id": "llama-123",
"object": "chat.completion",
"created": 1677652288,
"model": "llama-2-7b-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
}
}
Example Request
curl -X POST "api.koveh.com/ai/llama/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "llama-2-7b-chat"
}'
DeepSeek Chat Completion
Generate text completions using DeepSeek models.
Endpoint: POST /deepseek/chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "deepseek-chat",
"max_tokens": 100,
"temperature": 0.7
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "deepseek-chat"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-1). Default: 0.7
Response
{
"id": "deepseek-123",
"object": "chat.completion",
"created": 1677652288,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
}
}
Example Request
curl -X POST "api.koveh.com/ai/deepseek/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "deepseek-chat"
}'
Available Models
Get list of available AI models.
Endpoint: GET /models
Response
{
"models": {
"openai": [
{
"id": "gpt-4",
"name": "GPT-4",
"description": "Most capable GPT model",
"max_tokens": 8192
},
{
"id": "gpt-3.5-turbo",
"name": "GPT-3.5 Turbo",
"description": "Fast and efficient model",
"max_tokens": 4096
}
],
"claude": [
{
"id": "claude-3-opus-20240229",
"name": "Claude 3 Opus",
"description": "Most capable Claude model",
"max_tokens": 4096
},
{
"id": "claude-3-sonnet-20240229",
"name": "Claude 3 Sonnet",
"description": "Balanced performance model",
"max_tokens": 4096
}
],
"mistral": [
{
"id": "mistral-large",
"name": "Mistral Large",
"description": "Most capable Mistral model",
"max_tokens": 32768
},
{
"id": "mistral-medium",
"name": "Mistral Medium",
"description": "Balanced performance model",
"max_tokens": 32768
}
],
"llama": [
{
"id": "llama-2-70b-chat",
"name": "Llama 2 70B Chat",
"description": "Large Llama 2 model",
"max_tokens": 4096
},
{
"id": "llama-2-7b-chat",
"name": "Llama 2 7B Chat",
"description": "Smaller Llama 2 model",
"max_tokens": 4096
}
],
"deepseek": [
{
"id": "deepseek-chat",
"name": "DeepSeek Chat",
"description": "DeepSeek chat model",
"max_tokens": 32768
}
]
}
}
Example Request
curl -X GET "api.koveh.com/ai/models" \
-H "Authorization: Bearer YOUR_API_KEY"
Health Check
Check service health status.
Endpoint: GET /health
Response
{
"status": "healthy",
"timestamp": "2025-08-30T09:19:31.245295",
"providers": {
"openai": "available",
"claude": "available",
"mistral": "available",
"llama": "available",
"deepseek": "available"
}
}
Example Request
curl -X GET "api.koveh.com/ai/health"
Integration Examples
Python Example - Multi-Provider
import requests
def chat_with_provider(provider, messages, model=None):
response = requests.post(
f"http://api.koveh.com/ai/{provider}/chat",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"messages": messages,
"model": model
}
)
return response.json()
# Try different providers
providers = ["openai", "claude", "mistral", "llama", "deepseek"]
messages = [{"role": "user", "content": "What is the capital of France?"}]
for provider in providers:
try:
result = chat_with_provider(provider, messages)
print(f"{provider}: {result['choices'][0]['message']['content']}")
except Exception as e:
print(f"{provider}: Error - {e}")
JavaScript Example - OpenAI
async function chatWithOpenAI(messages, model = 'gpt-3.5-turbo') {
const response = await fetch('http://api.koveh.com/ai/openai/chat', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
messages: messages,
model: model
})
});
return await response.json();
}
// Use the function
const messages = [
{role: 'user', content: 'What is the capital of France?'}
];
chatWithOpenAI(messages)
.then(result => console.log(result.choices[0].message.content));
Embeddings Example
import requests
def get_embeddings(text, model="text-embedding-ada-002"):
response = requests.post(
"http://api.koveh.com/ai/openai/embeddings",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"input": text,
"model": model
}
)
return response.json()
# Get embeddings
embeddings = get_embeddings("Sample text for embedding")
print(f"Embedding dimensions: {len(embeddings['data'][0]['embedding'])}")
Error Handling
The API returns standard error responses:
{
"error": "Invalid model specified",
"status_code": 400,
"timestamp": "2025-08-30T09:19:31.245295"
}
Common error codes:
400
: Bad Request (invalid parameters, model not found)401
: Unauthorized (missing or invalid API key)404
: Not Found (invalid endpoint)429
: Too Many Requests (rate limit exceeded)500
: Internal Server Error (provider API error)
Rate Limiting
Each provider has its own rate limiting:
- OpenAI: 50 requests per minute
- Claude: 30 requests per minute
- Mistral: 40 requests per minute
- Llama: 60 requests per minute
- DeepSeek: 50 requests per minute
Best Practices
- Model Selection: Choose the appropriate model for your use case
- Token Limits: Be aware of model-specific token limits
- Error Handling: Implement proper error handling for each provider
- Rate Limiting: Respect rate limits and implement backoff strategies
- Cost Optimization: Use smaller models for simple tasks
- Fallback Strategy: Have fallback providers in case one is unavailable
Use Cases
- Content Generation: Generate articles, blog posts, and creative content
- Code Generation: Generate and explain code snippets
- Language Translation: Translate text between languages
- Question Answering: Answer questions based on context
- Text Summarization: Summarize long documents
- Sentiment Analysis: Analyze sentiment in text
- Text Classification: Classify text into categories