Understanding limits, optimizing performance, and building reliable SM-AI-MODELS applications.
API Limits
Text-to-Speech (TTS) Limits
Input Text
| Limit | Value | Error Code |
|---|---|---|
| Maximum text length | 51,200 characters | 400 http_error — "Text exceeds maximum length of 51,200 characters" |
| Maximum text length (streaming) | 10,000 characters | text_too_long |
| Minimum text length | 1 character | invalid_request |
| Empty input | Not allowed | invalid_request |
Handling long text:
Code
Voice Parameters
| Parameter | Allowed Values | Default |
|---|---|---|
| Voices | Yara, Nouf, Atheer, Yara_en | Yara |
| Invalid voice | Returns invalid_voice error | — |
Audio Format
| Format | Extension | Content-Type | Use Case |
|---|---|---|---|
| mp3 | .mp3 | audio/mpeg | Web playback, smallest size |
| wav | .wav | audio/wav | High quality, editing |
| opus | .opus | audio/opus | Streaming, low latency |
| flac | .flac | audio/flac | Lossless compression |
Speed Range
| Limit | Value | Error Code |
|---|---|---|
| Minimum speed | 0.5 (2x slower) | clamped silently — no error |
| Maximum speed | 2.0 (2x faster) | clamped silently — no error |
| Default speed | 1.0 (normal) | — |
Values outside
0.5 – 2.0are clamped to the nearest bound. The request does not fail — the response is simply rendered at the clamped speed. If you need exact-speed behaviour, validate client-side before sending.
Speech Recognition (ASR) Limits
File Size
| Limit | Value | Error Code |
|---|---|---|
| Maximum file size | 100 MB | 400 http_error — "Invalid file type" / multipart upload rejected |
| Empty file | Not allowed | invalid_file |
Audio Duration
| Limit | Value | Notes |
|---|---|---|
| Maximum duration | 300 seconds (5 minutes) | Per single request |
| Maximum duration (streaming) | 3,600 seconds (1 hour) | Per gRPC streaming session |
| Optimal duration | Under 30 seconds | Best performance |
Tip: For long recordings, split into smaller segments:
Code
Supported Formats
| Format | Extension | Recommended |
|---|---|---|
| FLAC | .flac | ✓ Best quality |
| MP3 | .mp3 | Common format |
| WAV | .wav | Uncompressed |
| OGG | .ogg | Open format |
| WebM | .webm | Web recordings |
Unsupported formats return unsupported_format error.
Audio Quality
| Parameter | Recommendation | Notes |
|---|---|---|
| Sample rate | 16kHz or higher | Lower rates may reduce accuracy |
| Channels | Mono preferred | Stereo supported |
| Bit depth | 16-bit minimum | Higher is better |
Language Support
| Language | Support Level |
|---|---|
| Arabic | ✓ Full support |
| English | ✓ Full support |
| Other languages | ✓ Full support |
All languages receive equal support with high accuracy.
Rate Limits
SM-AI-MODELS enforces rate limits to ensure fair usage and system stability.
API Rate Limits
| Limit | Default | Description |
|---|---|---|
| Requests per minute (RPM) | 60 | Maximum API requests per minute per API key |
| Requests per second (RPS) | 10 | Maximum burst rate per API key |
| Concurrent requests | 5 | Maximum simultaneous in-flight requests per API key |
| Concurrent gRPC streams | 10 | Maximum open gRPC streaming sessions per API key |
Custom limits: Enterprise deployments can have custom rate limits. Contact sales.
Rate Limit Headers
Every API response includes rate limit headers:
Code
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum requests allowed per minute |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the rate limit window resets |
X-Request-ID | Unique request identifier (use for support tickets) |
Rate Limit Errors
When you exceed a rate limit, the API returns a 429 Too Many Requests response:
Code
The response includes a Retry-After header indicating how many seconds to wait:
Code
Handling Rate Limits
Python — Rate Limit Handler with Exponential Backoff
Code
Batch Processing
For bulk TTS/ASR processing, respect rate limits with a queue:
Code
Monitoring Your Usage
Check your current rate limit status:
Code
Code
Concurrent Requests
| Limit | Value | Notes |
|---|---|---|
| Concurrent connections | Contact admin for specific limit | Varies by deployment |
| Recommended | Keep under 10 concurrent | For optimal performance |
Example with connection pooling:
Code
Request/Response Size
TTS Response Size
Response size depends on:
- Input text length — Longer text = larger audio file
- Audio format — mp3 < opus < flac < wav
- Duration — Approximately 1 second of audio per 10 characters (varies by language)
Estimated sizes (for 100 characters of Arabic text):
Code
ASR Request Size
Maximum file size: Contact admin for specific limit
Optimization tips:
- Use FLAC for best compression without quality loss
- Compress audio before uploading if using WAV
- Remove silence from beginning/end of recordings
Timeout Recommendations
| Operation | Recommended Timeout | Notes |
|---|---|---|
| TTS | 30 seconds | Longer for large text |
| ASR | 60 seconds | Depends on file size |
| Health check | 5 seconds | Quick response expected |
Example with timeout:
Code
Performance & Latency
Optimize SM-AI-MODELS for real-time voice applications, IVR systems, and production workloads.
Latency Benchmarks
TTS Latency (SM-TTS-V1)
Measured under optimal conditions:
| Input Length | TTFC (gRPC) | TTFC (HTTP) | Total Generation | Notes |
|---|---|---|---|---|
| Short (under 50 chars) | ~150ms | ~200ms | ~400ms | Single sentence |
| Medium (50-200 chars) | ~200ms | ~300ms | ~800ms | Short paragraph |
| Long (200-500 chars) | ~250ms | ~350ms | ~1.5s | Full paragraph |
| Very long (500-2000 chars) | ~300ms | ~400ms | ~3-5s | Multiple paragraphs |
TTFC = Time to First Chunk (streaming mode). This is the most critical metric for real-time applications — it determines how quickly the user hears the first audio.
ASR Latency (SM-STT-V1)
| Audio Duration | Processing Time (REST) | Real-time Factor | Notes |
|---|---|---|---|
| 1 second | ~300ms | 0.3x | Near real-time |
| 5 seconds | ~800ms | 0.16x | Fast |
| 30 seconds | ~3s | 0.1x | Optimal segment length |
| 60 seconds | ~6s | 0.1x | Acceptable |
| 300 seconds (max) | ~25-30s | 0.08-0.1x | Use streaming for long audio |
gRPC ASR Streaming Latency
| Metric | Value | Notes |
|---|---|---|
| Partial result latency | ~100-200ms | From speech to first partial transcript |
| Final result latency | ~300-500ms | From end of speech to confirmed transcript |
| End-of-utterance detection | ~500-800ms | Silence-based endpoint detection |
Optimization Guide
1. Choose the Right Protocol
| Protocol | TTFC | Throughput | Best For |
|---|---|---|---|
| gRPC Streaming | ⚡ ~150ms | High | MRCP, voice bots, real-time apps |
| HTTP Streaming | 🔶 ~200-300ms | Medium | Web apps, simple integrations |
| HTTP (non-streaming) | 🔴 Full wait | Medium | Batch processing, file generation |
2. Choose the Right Audio Format
| Format | Encoding Overhead | File Size | Best For |
|---|---|---|---|
pcm | ⚡ None | Large | Lowest latency, telephony (8kHz/16kHz) |
opus | ⚡ Minimal | Small | WebRTC, streaming, bandwidth-constrained |
mp3 | 🔶 ~20-50ms | Medium | Web playback, downloads |
wav | 🔶 Minimal | Large | Editing, archival, high quality |
flac | 🔴 ~30-60ms | Medium | Lossless archival |
3. Optimize Sample Rate
| Sample Rate | Use Case | Quality | Latency Impact |
|---|---|---|---|
| 8,000 Hz | Telephony (G.711) | Acceptable | ⚡ Fastest |
| 16,000 Hz | Telephony (wideband), voice bots | Good | ⚡ Fast |
| 22,050 Hz | General playback (default) | High | 🔶 Default |
| 24,000 Hz | High-quality applications | Highest | 🔶 Slightly slower |
4. Optimize Text Input
Code
5. Connection Management
Code
Code
Load Testing
Use standard load testing tools like k6, Locust, or simple shell scripts to measure your deployment's performance:
Code
Example output:
Code
Troubleshooting Latency
| Symptom | Likely Cause | Fix |
|---|---|---|
| TTFC > 1 second | GPU memory full | Check GPU utilization, reduce concurrent load |
| Latency spikes | Thermal throttling | Check GPU temperature with monitoring tools |
| Slow first request | Model cold start | Add warm-up request to deployment script |
| Degrading over time | Memory fragmentation | Schedule periodic service restarts |
| High latency on long text | Single-threaded processing | Split text into shorter segments |
Best Practices
1. Validate Before Sending
Code
2. Implement Retry Logic
Code
3. Monitor Usage
Code
4. Handle Limits Gracefully
Code
Summary of Key Limits
| Service | Limit Type | Value |
|---|---|---|
| TTS | Max text length | 51,200 characters per request |
| TTS | Speed range | 0.5 - 2.0 |
| ASR | Max file size | 100 MB |
| ASR | Max duration | 300 seconds (3,600 for streaming) |
| API | Requests per minute | 60 RPM |
| API | Requests per second | 10 RPS |
| API | Concurrent requests | 5 |
Note: Limits may vary for Enterprise deployments. Contact your administrator for deployment-specific values.
Next Steps
- Error Handling — Handle API errors properly
- REST API Documentation — TTS and ASR endpoint details
- Streaming — Real-time audio streaming
- Specifications — Engine specs and capabilities
