mistral_common.tokens.tokenizers.audio
AudioConfig(sampling_rate, frame_rate, encoding_config, chunk_length_s=None)
dataclass
Configuration for audio processing.
Attributes:
Name | Type | Description |
---|---|---|
sampling_rate |
int
|
Sampling rate of the audio. |
frame_rate |
float
|
Number of frames per second accepted by the tokenizer model. |
encoding_config |
AudioSpectrogramConfig
|
Configuration for audio spectrogram. |
chunk_length_s |
Optional[float]
|
Whether to pad an audio into multiples of chunk_length_s seconds (optional). |
audio_length_per_tok
property
Calculate the length of audio per token.
chunk_frames
property
Calculate the number of frames per chunk.
AudioEncoder(audio_config, special_ids)
Encodes audio chunks into a format suitable for further processing.
Attributes:
Name | Type | Description |
---|---|---|
audio_config |
Configuration for audio processing. |
|
encoding_config |
Configuration for audio spectrogram. |
|
special_ids |
Special tokens for audio encoding. |
Source code in src/mistral_common/tokens/tokenizers/audio.py
audio_token
property
Get the audio token.
begin_audio_token
property
Get the begin audio token.
__call__(content)
Call the encoder on an audio chunk or URL chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
content
|
Union[AudioChunk, AudioURLChunk]
|
Audio or URL chunk to encode. |
required |
Returns:
Type | Description |
---|---|
AudioEncoding
|
Encoded audio data and tokens. |
Source code in src/mistral_common/tokens/tokenizers/audio.py
next_multiple_of_chunk_frames(audio_array_len, sampling_rate)
Calculate the next multiple of chunk frames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_array_len
|
int
|
Length of the audio array. |
required |
sampling_rate
|
int
|
Sampling rate of the audio. |
required |
Returns:
Type | Description |
---|---|
int
|
The next multiple of chunk frames. |
Source code in src/mistral_common/tokens/tokenizers/audio.py
pad(audio_array, sampling_rate)
Pad the audio array to the desired length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_array
|
ndarray
|
Audio data as a numpy array. |
required |
sampling_rate
|
int
|
Sampling rate of the audio. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Padded audio array. |
Source code in src/mistral_common/tokens/tokenizers/audio.py
AudioEncoding(tokens, audio)
dataclass
AudioSpectrogramConfig(num_mel_bins, hop_length, window_size)
dataclass
Configuration for generating an audio spectrogram.
Attributes:
Name | Type | Description |
---|---|---|
num_mel_bins |
int
|
Number of mel bins, typically 80 or 128. |
hop_length |
int
|
Length of the overlapping windows for the STFT used to obtain the Mel Frequency coefficients, typically 160. |
window_size |
int
|
Window size of the Fourier transform, typically 400. |