Scribe v1 is our Speech to Text model, which allows you to transcribe speech in 99 languages. It includes word-level timestamps, speaker diarization, and audio-event tagging, and can handle up to 32 speakers with a high level of accuracy.
Currently we recommend using Scribe v1 when high-accuracy transcription is required, but we are working on a low-latency version for real-time applications which will be released soon.
Speech to Text is available via the website, and API.
For more information, please see the Speech to Text documentation.