mmlearn.datasets.librispeech¶
LibriSpeech dataset.
Functions
- pad_or_trim(array, length=480000, *, axis=-1)[source]¶
Pad or trim the audio array to length along the given axis.
- Parameters:
array (torch.Tensor) – Audio array.
length (int, default=480000) – Length to pad or trim to. Defaults to 30 seconds at 16 kHz.
axis (int, default=-1) – Axis along which to pad or trim.
- Returns:
array – Padded or trimmed audio array.
- Return type:
References
Classes
LibriSpeech dataset. |
- class LibriSpeech(root_dir, split='train-clean-100')[source]¶
LibriSpeech dataset.
This is a wrapper around
torchaudio.datasets.LIBRISPEECH
that assumes that the dataset is already downloaded and the top-level directory of the dataset in the root directory is librispeech.- Parameters:
root_dir (str) – Root directory of dataset.
split ({"train-clean-100", "train-clean-360", "train-other-500", "dev-clean", "dev-other", "test-clean", "test-other"}, default="train-clean-100") – Split of the dataset to use.
- Raises:
ImportError – If
torchaudio
is not installed.
Notes
This dataset only returns the audio and transcript from the dataset.
- pad_or_trim(array, length=480000, *, axis=-1)[source]¶
Pad or trim the audio array to length along the given axis.
- Parameters:
array (torch.Tensor) – Audio array.
length (int, default=480000) – Length to pad or trim to. Defaults to 30 seconds at 16 kHz.
axis (int, default=-1) – Axis along which to pad or trim.
- Returns:
array – Padded or trimmed audio array.
- Return type:
References
[1] https://github.com/openai/whisper/blob/main/whisper/audio.py#L65C1-L88C17