fl4health.utils.load_data module¶

class ToNumpy[source]¶: Bases: object

get_cifar10_data_and_target_tensors(data_dir, train)[source]¶

Return type:: tuple[Tensor, Tensor]

get_mnist_data_and_target_tensors(data_dir, train)[source]¶

Return type:: tuple[Tensor, Tensor]

get_train_and_val_cifar10_datasets(data_dir, transform=None, target_transform=None, validation_proportion=0.2, hash_key=None)[source]¶

Return type:: tuple[TensorDataset, TensorDataset]

get_train_and_val_mnist_datasets(data_dir, transform=None, target_transform=None, validation_proportion=0.2, hash_key=None)[source]¶

Return type:: tuple[TensorDataset, TensorDataset]

load_cifar10_data(data_dir, batch_size, sampler=None, validation_proportion=0.2, hash_key=None)[source]¶

Load CIFAR10 Dataset (training and validation set).

Parameters:

data_dir (Path) – The path to the CIFAR10 dataset locally. Dataset is downloaded to this location if it does not already exist.
batch_size (int) – The batch size to use for the train and validation dataloader.
sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.
validation_proportion (float) – A float between 0 and 1 specifying the proportion of samples to allocate to the validation dataset. Defaults to 0.2.
hash_key (int | None) – Optional hash key to create a reproducible split for train and validation datasets.

Returns:

The train data loader, validation data loader and a dictionary with the sample counts of datasets underpinning the respective data loaders.

Return type:

tuple[DataLoader, DataLoader, dict[str, int]]

load_cifar10_test_data(data_dir, batch_size, sampler=None)[source]¶

Load CIFAR10 Test Dataset.

Parameters:

data_dir (Path) – The path to the CIFAR10 dataset locally. Dataset is downloaded to this location if it does not already exist.
batch_size (int) – The batch size to use for the test dataloader.
sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.

Returns:

The test data loader and a dictionary containing the sample count of the test dataset.

Return type:

tuple[DataLoader, dict[str, int]]

load_mnist_data(data_dir, batch_size, sampler=None, transform=None, target_transform=None, dataset_converter=None, validation_proportion=0.2, hash_key=None)[source]¶

Load MNIST Dataset (training and validation set).

Parameters:

data_dir (Path) – The path to the MNIST dataset locally. Dataset is downloaded to this location if it does not already exist.
batch_size (int) – The batch size to use for the train and validation dataloader.
sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.
transform (Callable | None) – Optional transform to be applied to input samples.
target_transform (Callable | None) – Optional transform to be applied to targets.
dataset_converter (DatasetConverter | None) – Optional dataset converter used to convert the input and/or target of train and validation dataset.
validation_proportion (float) – A float between 0 and 1 specifying the proportion of samples to allocate to the validation dataset. Defaults to 0.2.
hash_key (int | None) – Optional hash key to create a reproducible split for train and validation datasets.

Returns:

The train data loader, validation data loader and a dictionary with the sample counts of datasets underpinning the respective data loaders.

Return type:

tuple[DataLoader, DataLoader, dict[str, int]]

load_mnist_test_data(data_dir, batch_size, sampler=None, transform=None)[source]¶

Load MNIST Test Dataset.

Parameters:

data_dir (Path) – The path to the MNIST dataset locally. Dataset is downloaded to this location if it does not already exist.
batch_size (int) – The batch size to use for the test dataloader.
sampler (LabelBasedSampler | None) – Optional sampler to subsample dataset based on labels.
transform (Callable | None) – Optional transform to be applied to input samples.

Returns:

The test data loader and a dictionary containing the sample count of the test dataset.

Return type:

tuple[DataLoader, dict[str, int]]

load_msd_dataset(data_path, msd_dataset_name)[source]¶

Downloads and extracts one of the 10 Medical Segmentation Decathelon (MSD) datasets.

Parameters:

data_path (str) – Path to the folder in which to extract the dataset. The data itself will be in a subfolder named after the dataset, not in the data_path directory itself. The name of the folder will be the name of the dataset as defined by the values of the MsdDataset enum returned by get_msd_dataset_enum
msd_dataset_name (str) – One of the 10 msd datasets

Return type:

None

split_data_and_targets(data, targets, validation_proportion=0.2, hash_key=None)[source]¶

Return type:: tuple[Tensor, Tensor, Tensor, Tensor]