Speech to Text API
Audio transcription services using various models
Speech to Text API
Audio transcription services using Whisper, Whisper Lite, and T-One models for converting speech to text.
Base URL: api.koveh.com/speech-to-text/
Endpoints
| Method | Endpoint | Description | 
|---|---|---|
| GET | / | Service info | 
| POST | /whisper-lite | Transcribe with Whisper Lite | 
| POST | /whisper | Transcribe with Whisper | 
| POST | /t-one | Transcribe with T-One | 
| GET | /health | Service health check | 
Authentication
All endpoints require Bearer token authentication:
curl -H "Authorization: Bearer YOUR_API_KEY" \
  "api.koveh.com/speech-to-text/whisper"Whisper Transcription
Transcribe audio using OpenAI's Whisper model.
Endpoint: POST /whisper
Request Body
{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "whisper-1",
  "response_format": "json",
  "temperature": 0.0
}Parameters
- audio_file(string, required): Base64 encoded audio data or file path
- language(string, optional): Language code (e.g., "en", "es", "fr"). Default: auto-detect
- model(string, optional): Whisper model to use. Default: "whisper-1"
- response_format(string, optional): Response format. Default: "json"
- temperature(number, optional): Sampling temperature (0-1). Default: 0.0
Response
{
  "text": "Hello, this is a test transcription of audio content.",
  "language": "en",
  "duration": 5.2,
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Hello, this is a test transcription of audio content."
    }
  ],
  "model": "whisper-1",
  "timestamp": "2025-08-30T09:19:31.245295"
}Example Request
curl -X POST "api.koveh.com/speech-to-text/whisper" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'Whisper Lite Transcription
Transcribe audio using Whisper Lite model (faster, lighter version).
Endpoint: POST /whisper-lite
Request Body
{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "whisper-lite",
  "response_format": "json"
}Parameters
- audio_file(string, required): Base64 encoded audio data or file path
- language(string, optional): Language code. Default: auto-detect
- model(string, optional): Model to use. Default: "whisper-lite"
- response_format(string, optional): Response format. Default: "json"
Response
{
  "text": "Hello, this is a test transcription using Whisper Lite.",
  "language": "en",
  "duration": 5.2,
  "model": "whisper-lite",
  "timestamp": "2025-08-30T09:19:31.245295"
}Example Request
curl -X POST "api.koveh.com/speech-to-text/whisper-lite" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'T-One Transcription
Transcribe audio using T-One model (specialized for certain languages/accents).
Endpoint: POST /t-one
Request Body
{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "t-one",
  "response_format": "json"
}Parameters
- audio_file(string, required): Base64 encoded audio data or file path
- language(string, optional): Language code. Default: auto-detect
- model(string, optional): Model to use. Default: "t-one"
- response_format(string, optional): Response format. Default: "json"
Response
{
  "text": "Hello, this is a test transcription using T-One model.",
  "language": "en",
  "duration": 5.2,
  "model": "t-one",
  "timestamp": "2025-08-30T09:19:31.245295"
}Example Request
curl -X POST "api.koveh.com/speech-to-text/t-one" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'Service Information
Get information about the speech-to-text service.
Endpoint: GET /
Response
{
  "service": "Speech to Text API",
  "version": "1.0.0",
  "models": [
    {
      "name": "whisper",
      "description": "OpenAI Whisper model for high-quality transcription",
      "supported_languages": ["en", "es", "fr", "de", "it", "pt", "ru", "ja", "ko", "zh"],
      "max_audio_duration": 300
    },
    {
      "name": "whisper-lite",
      "description": "Lightweight Whisper model for faster transcription",
      "supported_languages": ["en", "es", "fr", "de"],
      "max_audio_duration": 300
    },
    {
      "name": "t-one",
      "description": "T-One model for specialized transcription",
      "supported_languages": ["en", "es", "fr"],
      "max_audio_duration": 300
    }
  ],
  "supported_formats": ["mp3", "wav", "m4a", "flac", "ogg"],
  "max_file_size": "25MB"
}Example Request
curl -X GET "api.koveh.com/speech-to-text/" \
  -H "Authorization: Bearer YOUR_API_KEY"Health Check
Check service health status.
Endpoint: GET /health
Response
{
  "status": "healthy",
  "timestamp": "2025-08-30T09:19:31.245295",
  "models_loaded": {
    "whisper": true,
    "whisper-lite": true,
    "t-one": true
  },
  "gpu_available": true
}Example Request
curl -X GET "api.koveh.com/speech-to-text/health"Integration Examples
Python Example
import requests
import base64
def transcribe_audio(audio_file_path, model="whisper", language="en"):
    # Read and encode audio file
    with open(audio_file_path, "rb") as audio_file:
        audio_data = base64.b64encode(audio_file.read()).decode("utf-8")
    
    response = requests.post(
        f"http://api.koveh.com/speech-to-text/{model}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "audio_file": audio_data,
            "language": language
        }
    )
    return response.json()
# Transcribe audio file
result = transcribe_audio("audio.mp3", "whisper", "en")
print(f"Transcription: {result['text']}")JavaScript Example
async function transcribeAudio(audioFile, model = 'whisper', language = 'en') {
    // Convert file to base64
    const base64 = await new Promise((resolve) => {
        const reader = new FileReader();
        reader.onload = () => resolve(reader.result.split(',')[1]);
        reader.readAsDataURL(audioFile);
    });
    
    const response = await fetch(`http://api.koveh.com/speech-to-text/${model}`, {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY',
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            audio_file: base64,
            language: language
        })
    });
    return await response.json();
}
// Transcribe audio file
const audioFile = document.getElementById('audioFile').files[0];
transcribeAudio(audioFile, 'whisper', 'en')
    .then(result => console.log(`Transcription: ${result.text}`));Real-time Transcription with WebSocket
import asyncio
import websockets
import json
import base64
async def real_time_transcription():
    uri = "ws://api.koveh.com/speech-to-text/ws"
    async with websockets.connect(uri) as websocket:
        # Send audio chunks
        while True:
            # Get audio chunk (implement your audio capture logic)
            audio_chunk = get_audio_chunk()
            audio_base64 = base64.b64encode(audio_chunk).decode("utf-8")
            
            await websocket.send(json.dumps({
                "audio": audio_base64,
                "model": "whisper-lite"
            }))
            
            # Receive transcription
            response = await websocket.recv()
            result = json.loads(response)
            print(f"Real-time: {result['text']}")
# Run real-time transcription
asyncio.run(real_time_transcription())Error Handling
The API returns standard error responses:
{
  "error": "Invalid audio file format",
  "status_code": 400,
  "timestamp": "2025-08-30T09:19:31.245295"
}Common error codes:
- 400: Bad Request (invalid audio format, file too large)
- 401: Unauthorized (missing or invalid API key)
- 404: Not Found (invalid endpoint)
- 413: Payload Too Large (audio file exceeds size limit)
- 500: Internal Server Error (transcription model error)
Rate Limiting
- Rate Limit: 100 requests per minute
- Concurrent Requests: 5 simultaneous requests
- Timeout: 60 seconds per request (for longer audio files)
Best Practices
- Audio Quality: Use high-quality audio (16kHz+ sample rate, clear speech)
- File Size: Keep audio files under 25MB for optimal performance
- Language Detection: Let the model auto-detect language when possible
- Model Selection: Use Whisper Lite for faster processing, Whisper for higher accuracy
- Error Handling: Always check for error responses and handle timeouts
- Caching: Cache transcriptions for repeated audio content
Supported Audio Formats
- MP3: Most common, good compression
- WAV: Uncompressed, high quality
- M4A: Apple format, good compression
- FLAC: Lossless compression
- OGG: Open source format
Use Cases
- Meeting Transcription: Convert meeting recordings to text
- Podcast Processing: Transcribe podcast episodes
- Voice Notes: Convert voice memos to text
- Accessibility: Provide captions for video content
- Language Learning: Transcribe speech for language practice
- Content Creation: Generate text from audio interviews
- Customer Service: Transcribe customer calls for analysis