Track all changes, improvements, and fixes to SM-AI-MODELS.
Versioning: SM-AI-MODELS follows Semantic Versioning. API changes are documented with migration guides when needed.
v2.0.0 — Current Release
Released: 2024
New Features
- 🚀 SM-TTS-V1 — Next-generation neural TTS engine with improved Arabic pronunciation
- 🚀 SM-STT-V1 — Upgraded ASR model with lower word error rate
- 🎙️ Nouf voice — New Arabic female voice with warm, conversational tone
- 🇬🇧 Yara_en voice — English support via dedicated English voice
- ⚡ gRPC streaming — Real-time audio streaming via gRPC (TTS port 50051, ASR port 50052)
- 🔊 OPUS format — Added OPUS audio output for low-latency streaming
- 🔊 FLAC format — Added lossless FLAC output
- 🎚️ Speed control — Adjustable speech speed from 0.25x to 4.0x
Improvements
- Reduced TTS latency by ~40% compared to v1
- Improved Arabic diacritics handling
- Better mixed Arabic/English text processing
- Enhanced number normalization (Arabic and English)
- Improved audio quality at 16kHz sample rate
API Changes
- New endpoint path:
/v1/audio/speech(was/synthesizein v1) - New endpoint path:
/v1/audio/transcriptions(was/transcribein v1) - New response format for errors (structured JSON with error codes)
- Added
/healthand/readyendpoints - gRPC endpoints on new ports (50051, 50052)
Breaking Changes
- ⚠️ v1 endpoints deprecated —
/synthesizeand/transcribestill work but will be removed - ⚠️ Response format changed — Error responses now use
{"error": {"code": ..., "message": ...}} - ⚠️ Default audio format — Changed from WAV to MP3
Migration from v1
| v1 | v2 | Notes |
|---|---|---|
POST /synthesize | POST /v1/audio/speech | New path |
POST /transcribe | POST /v1/audio/transcriptions | New path, multipart form |
text parameter | input parameter | Renamed in TTS |
speaker parameter | voice parameter | Renamed, new voice names |
| WAV default output | MP3 default output | Set response_format: "wav" for old behavior |
v1.0.0 — Legacy
Released: 2023 — ⚠️ Deprecated
- Initial release with basic Arabic TTS
- Single voice (Yara)
- WAV output only
- REST API only
End of Life: v1 endpoints will be removed in a future release. Please migrate to v2.
Deprecation Policy
| Stage | Timeline | Action Required |
|---|---|---|
| Deprecated | Feature announced as deprecated | Begin migration planning |
| Sunset | 6 months after deprecation | Complete migration |
| Removed | After sunset date | Feature no longer available |
Deprecated features are marked with ⚠️ in the documentation. Breaking changes are announced at least 30 days in advance via:
- This changelog
- API response headers (
X-Deprecation-Warning) - Email notification to registered API key owners
Upcoming
| Feature | Status | Expected |
|---|---|---|
| Speaker diarization (ASR) | In development | Q2 2025 |
| Additional Arabic male voice | In development | Q2 2025 |
| SSML full support | Planned | Q3 2025 |
| Custom voice cloning | Research | TBD |
Last modified on
