Qwen3 AI API

Text generation and embeddings using Qwen3 models for chat completion and semantic search.

Base URL: api.koveh.com/qwen3/

Endpoints

Method	Endpoint	Description
GET	`/health`	Service health check
POST	`/chat`	Chat completion with Qwen3
POST	`/embeddings`	Generate text embeddings
GET	`/models`	Get available AI models
GET	`/chat/history`	Get chat history
GET	`/chat/sessions`	Get chat sessions
GET	`/chat/stats`	Get chat statistics
POST	`/chat/similar`	Find similar conversations
GET	`/usage`	Get usage statistics

Authentication

All endpoints require Bearer token authentication:

curl -H "Authorization: Bearer YOUR_API_KEY" \
  "api.koveh.com/qwen3/chat"

Chat Completion

Generate text completions using Qwen3 models.

Endpoint: POST /chat

Request Body

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "model": "qwen3-0.6b",
  "max_tokens": 100,
  "temperature": 0.7,
  "top_p": 0.9,
  "stream": false
}

Parameters

messages (array, required): Array of message objects with role and content
model (string, optional): Model to use. Default: "qwen3-0.6b"
max_tokens (number, optional): Maximum tokens to generate. Default: 100
temperature (number, optional): Sampling temperature (0-2). Default: 0.7
top_p (number, optional): Nucleus sampling parameter (0-1). Default: 0.9
stream (boolean, optional): Whether to stream the response. Default: false

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "qwen3-0.6b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 9,
    "total_tokens": 18
  }
}

Example Request

curl -X POST "api.koveh.com/qwen3/chat" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ],
    "model": "qwen3-0.6b",
    "max_tokens": 100
  }'

Text Embeddings

Generate embeddings for text using Qwen3 embedding models.

Endpoint: POST /embeddings

Request Body

{
  "text": "Sample text for embedding",
  "model": "qwen3-embedding-0.6b"
}

Parameters

text (string, required): Text to generate embeddings for
model (string, optional): Embedding model to use. Default: "qwen3-embedding-0.6b"

Response

{
  "object": "embedding",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.1, 0.2, 0.3, ...],
      "index": 0
    }
  ],
  "model": "qwen3-embedding-0.6b",
  "usage": {
    "prompt_tokens": 4,
    "total_tokens": 4
  }
}

Example Request

curl -X POST "api.koveh.com/qwen3/embeddings" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Sample text for embedding",
    "model": "qwen3-embedding-0.6b"
  }'

Get Available Models

Get list of available AI models.

Endpoint: GET /models

Response

[
  {
    "id": "qwen3-0.6b",
    "name": "Qwen3 0.6B",
    "type": "chat",
    "context_length": 8192,
    "description": "Qwen3 0.6B parameter model for chat completion"
  },
  {
    "id": "qwen3-embedding-0.6b",
    "name": "Qwen3 Embedding 0.6B",
    "type": "embedding",
    "context_length": 8192,
    "description": "Qwen3 0.6B parameter model for text embeddings"
  }
]

Health Check

Check service health and model status.

Endpoint: GET /health

Response

{
  "status": "healthy",
  "model_loaded": true,
  "embedding_model_loaded": true,
  "rabbitmq_connected": true,
  "timestamp": "2025-08-30T09:19:31.245295"
}

Chat History

Get chat history for the authenticated user.

Endpoint: GET /chat/history

Query Parameters

limit (number, optional): Number of records to return. Default: 50
offset (number, optional): Number of records to skip. Default: 0
model (string, optional): Filter by model
session_id (string, optional): Filter by session ID

Response

[
  {
    "id": 123,
    "session_id": "session-123",
    "model": "qwen3-0.6b",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"},
      {"role": "assistant", "content": "The capital of France is Paris."}
    ],
    "usage": {
      "prompt_tokens": 9,
      "completion_tokens": 9,
      "total_tokens": 18
    },
    "created_at": "2025-08-30T09:19:31.245295"
  }
]

Find Similar Conversations

Find conversations similar to a given query.

Endpoint: POST /chat/similar

Request Body

{
  "query": "What is the capital of France?",
  "limit": 5,
  "threshold": 0.7
}

Parameters

query (string, required): Query to find similar conversations for
limit (number, optional): Number of similar conversations to return. Default: 5
threshold (number, optional): Similarity threshold (0-1). Default: 0.7

Response

[
  {
    "id": 123,
    "session_id": "session-123",
    "similarity": 0.85,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"},
      {"role": "assistant", "content": "The capital of France is Paris."}
    ],
    "created_at": "2025-08-30T09:19:31.245295"
  }
]

Usage Statistics

Get usage statistics for the authenticated user.

Endpoint: GET /usage

Response

{
  "total_requests": 2500,
  "requests_today": 120,
  "requests_this_month": 1800,
  "models_used": {
    "qwen3-0.6b": 2000,
    "qwen3-embedding-0.6b": 500
  },
  "total_tokens": 45000,
  "average_tokens_per_request": 18
}

Error Responses

{
  "error": "Invalid model specified",
  "status_code": 400,
  "timestamp": "2025-08-30T09:19:31.245295"
}

Rate Limiting

Limit: 50 requests per minute
Headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

Available Models

Chat Models

qwen3-0.6b: Qwen3 0.6B parameter model for chat completion
qwen3-1.5b: Qwen3 1.5B parameter model for chat completion (if available)

Embedding Models

qwen3-embedding-0.6b: Qwen3 0.6B parameter model for text embeddings

Qwen3 AI API

On this page