Speech to Text API

Audio transcription services using Whisper, Whisper Lite, and T-One models for converting speech to text.

Base URL: api.koveh.com/speech-to-text/

Endpoints

Method	Endpoint	Description
GET	`/`	Service info
POST	`/whisper-lite`	Transcribe with Whisper Lite
POST	`/whisper`	Transcribe with Whisper
POST	`/t-one`	Transcribe with T-One
GET	`/health`	Service health check

Authentication

All endpoints require Bearer token authentication:

curl -H "Authorization: Bearer YOUR_API_KEY" \
  "api.koveh.com/speech-to-text/whisper"

Whisper Transcription

Transcribe audio using OpenAI's Whisper model.

Endpoint: POST /whisper

Request Body

{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "whisper-1",
  "response_format": "json",
  "temperature": 0.0
}

Parameters

audio_file (string, required): Base64 encoded audio data or file path
language (string, optional): Language code (e.g., "en", "es", "fr"). Default: auto-detect
model (string, optional): Whisper model to use. Default: "whisper-1"
response_format (string, optional): Response format. Default: "json"
temperature (number, optional): Sampling temperature (0-1). Default: 0.0

Response

{
  "text": "Hello, this is a test transcription of audio content.",
  "language": "en",
  "duration": 5.2,
  "segments": [
    {
      "start": 0.0,
      "end": 5.2,
      "text": "Hello, this is a test transcription of audio content."
    }
  ],
  "model": "whisper-1",
  "timestamp": "2025-08-30T09:19:31.245295"
}

Example Request

curl -X POST "api.koveh.com/speech-to-text/whisper" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'

Whisper Lite Transcription

Transcribe audio using Whisper Lite model (faster, lighter version).

Endpoint: POST /whisper-lite

Request Body

{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "whisper-lite",
  "response_format": "json"
}

Parameters

audio_file (string, required): Base64 encoded audio data or file path
language (string, optional): Language code. Default: auto-detect
model (string, optional): Model to use. Default: "whisper-lite"
response_format (string, optional): Response format. Default: "json"

Response

{
  "text": "Hello, this is a test transcription using Whisper Lite.",
  "language": "en",
  "duration": 5.2,
  "model": "whisper-lite",
  "timestamp": "2025-08-30T09:19:31.245295"
}

Example Request

curl -X POST "api.koveh.com/speech-to-text/whisper-lite" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'

T-One Transcription

Transcribe audio using T-One model (specialized for certain languages/accents).

Endpoint: POST /t-one

Request Body

{
  "audio_file": "base64_encoded_audio_or_file_path",
  "language": "en",
  "model": "t-one",
  "response_format": "json"
}

Parameters

audio_file (string, required): Base64 encoded audio data or file path
language (string, optional): Language code. Default: auto-detect
model (string, optional): Model to use. Default: "t-one"
response_format (string, optional): Response format. Default: "json"

Response

{
  "text": "Hello, this is a test transcription using T-One model.",
  "language": "en",
  "duration": 5.2,
  "model": "t-one",
  "timestamp": "2025-08-30T09:19:31.245295"
}

Example Request

curl -X POST "api.koveh.com/speech-to-text/t-one" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_file": "base64_encoded_audio_data",
    "language": "en"
  }'

Service Information

Get information about the speech-to-text service.

Endpoint: GET /

Response

{
  "service": "Speech to Text API",
  "version": "1.0.0",
  "models": [
    {
      "name": "whisper",
      "description": "OpenAI Whisper model for high-quality transcription",
      "supported_languages": ["en", "es", "fr", "de", "it", "pt", "ru", "ja", "ko", "zh"],
      "max_audio_duration": 300
    },
    {
      "name": "whisper-lite",
      "description": "Lightweight Whisper model for faster transcription",
      "supported_languages": ["en", "es", "fr", "de"],
      "max_audio_duration": 300
    },
    {
      "name": "t-one",
      "description": "T-One model for specialized transcription",
      "supported_languages": ["en", "es", "fr"],
      "max_audio_duration": 300
    }
  ],
  "supported_formats": ["mp3", "wav", "m4a", "flac", "ogg"],
  "max_file_size": "25MB"
}

Example Request

curl -X GET "api.koveh.com/speech-to-text/" \
  -H "Authorization: Bearer YOUR_API_KEY"

Health Check

Check service health status.

Endpoint: GET /health

Response

{
  "status": "healthy",
  "timestamp": "2025-08-30T09:19:31.245295",
  "models_loaded": {
    "whisper": true,
    "whisper-lite": true,
    "t-one": true
  },
  "gpu_available": true
}

Example Request

curl -X GET "api.koveh.com/speech-to-text/health"

Integration Examples

Python Example

import requests
import base64

def transcribe_audio(audio_file_path, model="whisper", language="en"):
    # Read and encode audio file
    with open(audio_file_path, "rb") as audio_file:
        audio_data = base64.b64encode(audio_file.read()).decode("utf-8")
    
    response = requests.post(
        f"http://api.koveh.com/speech-to-text/{model}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "audio_file": audio_data,
            "language": language
        }
    )
    return response.json()

# Transcribe audio file
result = transcribe_audio("audio.mp3", "whisper", "en")
print(f"Transcription: {result['text']}")

JavaScript Example

async function transcribeAudio(audioFile, model = 'whisper', language = 'en') {
    // Convert file to base64
    const base64 = await new Promise((resolve) => {
        const reader = new FileReader();
        reader.onload = () => resolve(reader.result.split(',')[1]);
        reader.readAsDataURL(audioFile);
    });
    
    const response = await fetch(`http://api.koveh.com/speech-to-text/${model}`, {
        method: 'POST',
        headers: {
            'Authorization': 'Bearer YOUR_API_KEY',
            'Content-Type': 'application/json'
        },
        body: JSON.stringify({
            audio_file: base64,
            language: language
        })
    });
    return await response.json();
}

// Transcribe audio file
const audioFile = document.getElementById('audioFile').files[0];
transcribeAudio(audioFile, 'whisper', 'en')
    .then(result => console.log(`Transcription: ${result.text}`));

Real-time Transcription with WebSocket

import asyncio
import websockets
import json
import base64

async def real_time_transcription():
    uri = "ws://api.koveh.com/speech-to-text/ws"
    async with websockets.connect(uri) as websocket:
        # Send audio chunks
        while True:
            # Get audio chunk (implement your audio capture logic)
            audio_chunk = get_audio_chunk()
            audio_base64 = base64.b64encode(audio_chunk).decode("utf-8")
            
            await websocket.send(json.dumps({
                "audio": audio_base64,
                "model": "whisper-lite"
            }))
            
            # Receive transcription
            response = await websocket.recv()
            result = json.loads(response)
            print(f"Real-time: {result['text']}")

# Run real-time transcription
asyncio.run(real_time_transcription())

Error Handling

The API returns standard error responses:

{
  "error": "Invalid audio file format",
  "status_code": 400,
  "timestamp": "2025-08-30T09:19:31.245295"
}

Common error codes:

400: Bad Request (invalid audio format, file too large)
401: Unauthorized (missing or invalid API key)
404: Not Found (invalid endpoint)
413: Payload Too Large (audio file exceeds size limit)
500: Internal Server Error (transcription model error)

Rate Limiting

Rate Limit: 100 requests per minute
Concurrent Requests: 5 simultaneous requests
Timeout: 60 seconds per request (for longer audio files)

Best Practices

Audio Quality: Use high-quality audio (16kHz+ sample rate, clear speech)
File Size: Keep audio files under 25MB for optimal performance
Language Detection: Let the model auto-detect language when possible
Model Selection: Use Whisper Lite for faster processing, Whisper for higher accuracy
Error Handling: Always check for error responses and handle timeouts
Caching: Cache transcriptions for repeated audio content

Supported Audio Formats

MP3: Most common, good compression
WAV: Uncompressed, high quality
M4A: Apple format, good compression
FLAC: Lossless compression
OGG: Open source format

Use Cases

Meeting Transcription: Convert meeting recordings to text
Podcast Processing: Transcribe podcast episodes
Voice Notes: Convert voice memos to text
Accessibility: Provide captions for video content
Language Learning: Transcribe speech for language practice
Content Creation: Generate text from audio interviews
Customer Service: Transcribe customer calls for analysis

Speech to Text API

On this page