Aimez-vous le son de Tesla ? xAI ouvre officiellement l'API vocale Grok, TTS à 4,2 dollars pour un million de caractères, avec un taux de reconnaissance surpassant ElevenLabs

robot
Création du résumé en cours

xAI officially launches independent Grok speech-to-text (STT) and text-to-speech (TTS) APIs this week, with this tech stack already operational in Grok Voice, Tesla vehicles, and Starlink customer service systems. STT pricing is $0.10 per batch hour and $0.20 per streaming hour, supporting over 25 languages.
(Previous context: Grok 4.3 beta opens to Heavy subscribers! Musk: the true flagship version training completed after 5 days)
(Additional background: Google launches Gemini 3.1 Flash TTS: audio tags make AI voiceovers more lively, supporting 70+ languages, Google AI Studio free trial)

Table of Contents

Toggle

  • STT: word-level timestamps + speaker diarization, batch transcription only $0.10 per hour
  • TTS: 5 voice personalities + voice tags, $4.2 per million characters
  • The same tech stack powers Tesla and Starlink

The same set of voice technologies that makes Tesla vehicles speak and Starlink customer service respond to users is now available via API. xAI announced on the 17th the launch of independent Grok speech-to-text (STT) and text-to-speech (TTS) APIs, allowing external developers to directly call this speech infrastructure already in use within xAI products.

STT: word-level timestamps + speaker diarization, batch transcription only $0.10 per hour

According to official details, Grok STT API offers two access modes: batch processing via REST API and low-latency real-time streaming via WebSocket API. Pricing-wise, batch processing is $0.10 per hour and streaming is $0.20 per hour. The official statement claims that compared to mainstream competitors like ElevenLabs and Deepgram, the pricing has a significant advantage.

Functionally, Grok STT supports over 25 languages, with word-level timestamps, speaker diarization, multi-channel audio, and intelligent reverse text normalization. Suitable for enterprise scenarios such as meeting transcription, legal and medical records, and customer call logs requiring high accuracy.

In entity recognition benchmarks, Grok STT shows an advantage. When identifying key entities like names, accounts, and dates in phone calls, Grok STT’s error rate is 5.0%, compared to 12.0% for ElevenLabs, 13.5% for Deepgram, and 21.3% for AssemblyAI.

TTS: 5 voice personalities + voice tags, $4.2 per million characters

Grok TTS API offers five distinct voice styles: Ara (female, warm and friendly), Eve (female, lively and proactive), Leo (male, authoritative and powerful), Rex (male, confident and clear), Sal (neutral, smooth and balanced).

The API automatically detects input language, natively supporting over 20 languages, and uses BCP-47 language codes to control pronunciation.

Audio output formats include MP3, WAV, PCM (Linear16), G.711 μ-law, and G.711 A-law. The latter two are common telephony codecs, indicating xAI’s layout for telecom integration.

A key feature of the TTS API is “voice tags,” allowing developers to embed commands within text to finely control pauses, laughter, whispers, intonation emphasis, speech rate, and pitch, making synthesized speech more natural and human-like. Pricing is $4.20 per million characters.

The same tech stack powers Tesla and Starlink

xAI emphasizes that these two APIs are not entirely new technologies but are based on the same infrastructure already deployed in Grok Voice, Tesla vehicle voice interactions, and Starlink customer support systems.

This infrastructure first appeared at the end of 2025 as the Grok Voice Agent API, providing real-time voice dialogue capabilities, and ranked first in the Big Bench Audio benchmark, with initial audio response times under 1 second—about five times faster than recent competitors.

The release of these independent STT and TTS endpoints effectively splits the integrated voice pipeline into modular components, allowing developers to assemble them as needed.

XAI-8,41%
Voir l'original
Cette page peut inclure du contenu de tiers fourni à des fins d'information uniquement. Gate ne garantit ni l'exactitude ni la validité de ces contenus, n’endosse pas les opinions exprimées, et ne fournit aucun conseil financier ou professionnel à travers ces informations. Voir la section Avertissement pour plus de détails.
  • Récompense
  • Commentaire
  • Reposter
  • Partager
Commentaire
Ajouter un commentaire
Ajouter un commentaire
Aucun commentaire
  • Épingler