FastVLM Vision API
Apple's FastVLM-0.5B Vision Language Model API
The FastVLM API allows you to process images and generate text descriptions or answer questions about them using Apple's efficient FastVLM-0.5B model.
Base URL
https://api.koveh.com/fastvlm/
Authentication
All requests require a Bearer Token in the Authorization header.
Authorization: Bearer <YOUR_API_KEY>Endpoints
1. Process Vision Request
POST /vision
Processes an image with a text prompt. This endpoint accepts multipart/form-data.
Form Fields:
image(file): The image file to process.prompt(string): Your question or instruction about the image (e.g., "Describe this image").max_new_tokens(int, default: 128): Maximum tokens to generate.temperature(float, default: 0.7): Sampling temperature.
Example Fetch:
const formData = new FormData();
formData.append('image', imageFile);
formData.append('prompt', "What color is the car in this photo?");
const response = await fetch('https://api.koveh.com/fastvlm/vision', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY'
},
body: formData
});
const data = await response.json();Response:
{
"content": "The car in the photo is blue.",
"model": "FastVLM-0.5B",
"request_id": "vision_123_456",
"response_time_ms": 1200,
"tokens_used": 15
}2. List Models
GET /models
Returns the list of available vision models.
Service Health
GET /health
Checks if the model is loaded and backend services (RabbitMQ) are connected.