Speech-to-Text API
Audio transcription using Whisper and T-One models
The Speech-to-Text API provides various endpoints for transcribing audio files into text using OpenAI's Whisper (base and medium) and the specialized T-One model for Russian language.
Base URL
https://api.koveh.com/whisper/
Authentication
Access is currently protected via Nginx (if applicable) or restricted to local environments. No explicit API Key is required for these endpoints at this time.
Endpoints
1. Whisper Lite (Fast)
POST /whisper-lite
Transcribes audio using the Whisper base model. Fast but less accurate for complex audio.
Form Data:
file(audio file): The audio clip to transcribe.
2. Whisper Medium (Accurate)
POST /whisper
Transcribes audio using the Whisper medium model. More accurate but slower.
Form Data:
file(audio file): The audio clip to transcribe.
3. T-One (Russian Specialized)
POST /t-one
Optimized transcription for Russian language audio using the T-One model.
Form Data:
file(audio file): The audio clip to transcribe.
Response Format
All transcription endpoints return:
{
"text": "The transcribed text here.",
"language": "ru",
"model": "medium",
"duration": 15.5
}