Qwen3 AI API
Text generation and embeddings using Qwen3 models
Qwen3 AI API
Text generation and embeddings using Qwen3 models for chat completion and semantic search.
Base URL: api.koveh.com/qwen3/
Endpoints
Method | Endpoint | Description |
---|---|---|
GET | /health | Service health check |
POST | /chat | Chat completion with Qwen3 |
POST | /embeddings | Generate text embeddings |
GET | /models | Get available AI models |
GET | /chat/history | Get chat history |
GET | /chat/sessions | Get chat sessions |
GET | /chat/stats | Get chat statistics |
POST | /chat/similar | Find similar conversations |
GET | /usage | Get usage statistics |
Authentication
All endpoints require Bearer token authentication:
curl -H "Authorization: Bearer YOUR_API_KEY" \
"api.koveh.com/qwen3/chat"
Chat Completion
Generate text completions using Qwen3 models.
Endpoint: POST /chat
Request Body
{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "qwen3-0.6b",
"max_tokens": 100,
"temperature": 0.7,
"top_p": 0.9,
"stream": false
}
Parameters
messages
(array, required): Array of message objects withrole
andcontent
model
(string, optional): Model to use. Default: "qwen3-0.6b"max_tokens
(number, optional): Maximum tokens to generate. Default: 100temperature
(number, optional): Sampling temperature (0-2). Default: 0.7top_p
(number, optional): Nucleus sampling parameter (0-1). Default: 0.9stream
(boolean, optional): Whether to stream the response. Default: false
Response
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "qwen3-0.6b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
}
}
Example Request
curl -X POST "api.koveh.com/qwen3/chat" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"model": "qwen3-0.6b",
"max_tokens": 100
}'
Text Embeddings
Generate embeddings for text using Qwen3 embedding models.
Endpoint: POST /embeddings
Request Body
{
"text": "Sample text for embedding",
"model": "qwen3-embedding-0.6b"
}
Parameters
text
(string, required): Text to generate embeddings formodel
(string, optional): Embedding model to use. Default: "qwen3-embedding-0.6b"
Response
{
"object": "embedding",
"data": [
{
"object": "embedding",
"embedding": [0.1, 0.2, 0.3, ...],
"index": 0
}
],
"model": "qwen3-embedding-0.6b",
"usage": {
"prompt_tokens": 4,
"total_tokens": 4
}
}
Example Request
curl -X POST "api.koveh.com/qwen3/embeddings" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Sample text for embedding",
"model": "qwen3-embedding-0.6b"
}'
Get Available Models
Get list of available AI models.
Endpoint: GET /models
Response
[
{
"id": "qwen3-0.6b",
"name": "Qwen3 0.6B",
"type": "chat",
"context_length": 8192,
"description": "Qwen3 0.6B parameter model for chat completion"
},
{
"id": "qwen3-embedding-0.6b",
"name": "Qwen3 Embedding 0.6B",
"type": "embedding",
"context_length": 8192,
"description": "Qwen3 0.6B parameter model for text embeddings"
}
]
Health Check
Check service health and model status.
Endpoint: GET /health
Response
{
"status": "healthy",
"model_loaded": true,
"embedding_model_loaded": true,
"rabbitmq_connected": true,
"timestamp": "2025-08-30T09:19:31.245295"
}
Chat History
Get chat history for the authenticated user.
Endpoint: GET /chat/history
Query Parameters
limit
(number, optional): Number of records to return. Default: 50offset
(number, optional): Number of records to skip. Default: 0model
(string, optional): Filter by modelsession_id
(string, optional): Filter by session ID
Response
[
{
"id": 123,
"session_id": "session-123",
"model": "qwen3-0.6b",
"messages": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 9,
"total_tokens": 18
},
"created_at": "2025-08-30T09:19:31.245295"
}
]
Find Similar Conversations
Find conversations similar to a given query.
Endpoint: POST /chat/similar
Request Body
{
"query": "What is the capital of France?",
"limit": 5,
"threshold": 0.7
}
Parameters
query
(string, required): Query to find similar conversations forlimit
(number, optional): Number of similar conversations to return. Default: 5threshold
(number, optional): Similarity threshold (0-1). Default: 0.7
Response
[
{
"id": 123,
"session_id": "session-123",
"similarity": 0.85,
"messages": [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."}
],
"created_at": "2025-08-30T09:19:31.245295"
}
]
Usage Statistics
Get usage statistics for the authenticated user.
Endpoint: GET /usage
Response
{
"total_requests": 2500,
"requests_today": 120,
"requests_this_month": 1800,
"models_used": {
"qwen3-0.6b": 2000,
"qwen3-embedding-0.6b": 500
},
"total_tokens": 45000,
"average_tokens_per_request": 18
}
Error Responses
{
"error": "Invalid model specified",
"status_code": 400,
"timestamp": "2025-08-30T09:19:31.245295"
}
Rate Limiting
- Limit: 50 requests per minute
- Headers:
X-RateLimit-Limit
,X-RateLimit-Remaining
,X-RateLimit-Reset
Available Models
Chat Models
- qwen3-0.6b: Qwen3 0.6B parameter model for chat completion
- qwen3-1.5b: Qwen3 1.5B parameter model for chat completion (if available)
Embedding Models
- qwen3-embedding-0.6b: Qwen3 0.6B parameter model for text embeddings