fl4health.utils.load_data module

class ToNumpy[source]

Bases: object

get_cifar10_data_and_target_tensors(data_dir, train)[source]
Return type:

tuple[Tensor, Tensor]

get_mnist_data_and_target_tensors(data_dir, train)[source]
Return type:

tuple[Tensor, Tensor]

get_train_and_val_cifar10_datasets(data_dir, transform=None, target_transform=None, validation_proportion=0.2, hash_key=None)[source]
Return type:

tuple[TensorDataset, TensorDataset]

get_train_and_val_mnist_datasets(data_dir, transform=None, target_transform=None, validation_proportion=0.2, hash_key=None)[source]
Return type:

tuple[TensorDataset, TensorDataset]

load_cifar10_data(data_dir, batch_size, sampler=None, validation_proportion=0.2, hash_key=None)[source]

Load CIFAR10 Dataset (training and validation set).

Parameters:
  • data_dir (Path) – The path to the CIFAR10 dataset locally. Dataset is downloaded to this location if it does not already exist.

  • batch_size (int) – The batch size to use for the train and validation dataloader.

  • sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.

  • validation_proportion (float) – A float between 0 and 1 specifying the proportion of samples to allocate to the validation dataset. Defaults to 0.2.

  • hash_key (int | None) – Optional hash key to create a reproducible split for train and validation datasets.

Returns:

The train data loader, validation data loader

and a dictionary with the sample counts of datasets underpinning the respective data loaders.

Return type:

tuple[DataLoader, DataLoader, dict[str, int]]

load_cifar10_test_data(data_dir, batch_size, sampler=None)[source]

Load CIFAR10 Test Dataset.

Parameters:
  • data_dir (Path) – The path to the CIFAR10 dataset locally. Dataset is downloaded to this location if it does not already exist.

  • batch_size (int) – The batch size to use for the test dataloader.

  • sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.

Returns:

The test data loader and a dictionary containing the sample count

of the test dataset.

Return type:

tuple[DataLoader, dict[str, int]]

load_mnist_data(data_dir, batch_size, sampler=None, transform=None, target_transform=None, dataset_converter=None, validation_proportion=0.2, hash_key=None)[source]

Load MNIST Dataset (training and validation set).

Parameters:
  • data_dir (Path) – The path to the MNIST dataset locally. Dataset is downloaded to this location if it does not already exist.

  • batch_size (int) – The batch size to use for the train and validation dataloader.

  • sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.

  • transform (Callable | None) – Optional transform to be applied to input samples.

  • target_transform (Callable | None) – Optional transform to be applied to targets.

  • dataset_converter (DatasetConverter | None) – Optional dataset converter used to convert the input and/or target of train and validation dataset.

  • validation_proportion (float) – A float between 0 and 1 specifying the proportion of samples to allocate to the validation dataset. Defaults to 0.2.

  • hash_key (int | None) – Optional hash key to create a reproducible split for train and validation datasets.

Returns:

The train data loader, validation data loader

and a dictionary with the sample counts of datasets underpinning the respective data loaders.

Return type:

tuple[DataLoader, DataLoader, dict[str, int]]

load_mnist_test_data(data_dir, batch_size, sampler=None, transform=None)[source]

Load MNIST Test Dataset.

Parameters:
  • data_dir (Path) – The path to the MNIST dataset locally. Dataset is downloaded to this location if it does not already exist.

  • batch_size (int) – The batch size to use for the test dataloader.

  • sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.

  • transform (Callable | None) – Optional transform to be applied to input samples.

Returns:

The test data loader and a dictionary containing the sample count

of the test dataset.

Return type:

tuple[DataLoader, dict[str, int]]

load_msd_dataset(data_path, msd_dataset_name)[source]

Downloads and extracts one of the 10 Medical Segmentation Decathelon (MSD) datasets.

Parameters:
  • data_path (str) – Path to the folder in which to extract the dataset. The data itself will be in a subfolder named after the dataset, not in the data_path directory itself. The name of the folder will be the name of the dataset as defined by the values of the MsdDataset enum returned by get_msd_dataset_enum

  • msd_dataset_name (str) – One of the 10 msd datasets

Return type:

None

split_data_and_targets(data, targets, validation_proportion=0.2, hash_key=None)[source]
Return type:

tuple[Tensor, Tensor, Tensor, Tensor]