Transcript
The Transcript
class is designed to handle text-to-speech outputs generated by machine learning models, such as OpenAI's Whisper. It supports outputs that include word-level timestamps.
Constructing a Transcript
You typically create a Transcript
instance from JSON data. The JSON should adhere to the following structure:
The JSON structure is a 3-dimensional array, where the first level represents sentences, and each sentence contains a list of words or tokens. This structure preserves the semantic grouping of words.
To create a Transcript
from this JSON, use the following:
Manual Construction
You can also manually create a Transcript
instance:
Utility Methods
The Transcript
class provides several utility methods:
optimize()
: Adjusts the timestamps of words to improve readability when aligned on a timeline.toSRT()
: Converts the transcript to an SRT format blob, which can be downloaded and used with most video editing applications.slice(wordCount: number)
: Creates a newTranscript
containing only the specified number of words. This is useful for generating preview captions.
Iterating Over Words
The Transcript
class offers a powerful iteration method via the iter
function:
The iter
method allows you to iterate over words with various options, introducing a degree of randomness to improve captioning quality. If two values are provided, a random number between them is chosen.
Those are the available options for iteration:
count
: iterate by word countduration
: iterate by group durationlength
: iterate by the number of characters