Koveh API

FastVLM Vision API

Apple's FastVLM-0.5B Vision Language Model API

The FastVLM API allows you to process images and generate text descriptions or answer questions about them using Apple's efficient FastVLM-0.5B model.

Base URL

https://api.koveh.com/fastvlm/

Authentication

All requests require a Bearer Token in the Authorization header.

Authorization: Bearer <YOUR_API_KEY>

Endpoints

1. Process Vision Request

POST /vision

Processes an image with a text prompt. This endpoint accepts multipart/form-data.

Form Fields:

  • image (file): The image file to process.
  • prompt (string): Your question or instruction about the image (e.g., "Describe this image").
  • max_new_tokens (int, default: 128): Maximum tokens to generate.
  • temperature (float, default: 0.7): Sampling temperature.

Example Fetch:

const formData = new FormData();
formData.append('image', imageFile);
formData.append('prompt', "What color is the car in this photo?");

const response = await fetch('https://api.koveh.com/fastvlm/vision', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY'
  },
  body: formData
});
const data = await response.json();

Response:

{
  "content": "The car in the photo is blue.",
  "model": "FastVLM-0.5B",
  "request_id": "vision_123_456",
  "response_time_ms": 1200,
  "tokens_used": 15
}

2. List Models

GET /models

Returns the list of available vision models.


Service Health

GET /health

Checks if the model is loaded and backend services (RabbitMQ) are connected.

On this page