mmlearn.datasets.librispeech

LibriSpeech dataset.

Functions

pad_or_trim(array, length=480000, *, axis=-1)[source]

Pad or trim the audio array to length along the given axis.

Parameters:
  • array (torch.Tensor) – Audio array.

  • length (int, default=480000) – Length to pad or trim to. Defaults to 30 seconds at 16 kHz.

  • axis (int, default=-1) – Axis along which to pad or trim.

Returns:

array – Padded or trimmed audio array.

Return type:

torch.Tensor

References

Classes

LibriSpeech

LibriSpeech dataset.

class LibriSpeech(root_dir, split='train-clean-100')[source]

LibriSpeech dataset.

This is a wrapper around torchaudio.datasets.LIBRISPEECH that assumes that the dataset is already downloaded and the top-level directory of the dataset in the root directory is librispeech.

Parameters:
  • root_dir (str) – Root directory of dataset.

  • split ({"train-clean-100", "train-clean-360", "train-other-500", "dev-clean", "dev-other", "test-clean", "test-other"}, default="train-clean-100") – Split of the dataset to use.

Raises:

ImportError – If torchaudio is not installed.

Notes

This dataset only returns the audio and transcript from the dataset.

__getitem__(idx)[source]

Return an example from the dataset.

Return type:

Example

__len__()[source]

Return the length of the dataset.

Return type:

int

pad_or_trim(array, length=480000, *, axis=-1)[source]

Pad or trim the audio array to length along the given axis.

Parameters:
  • array (torch.Tensor) – Audio array.

  • length (int, default=480000) – Length to pad or trim to. Defaults to 30 seconds at 16 kHz.

  • axis (int, default=-1) – Axis along which to pad or trim.

Returns:

array – Padded or trimmed audio array.

Return type:

torch.Tensor

References