fl4health.datasets.rxrx1.load_data module¶

construct_rxrx1_tensor_dataset(metadata, data_path, client_num, dataset_type, transform=None)[source]¶

Construct a TensorDataset for rxrx1 data (https://www.rxrx.ai/rxrx1).

Parameters:

metadata (DataFrame) – A DataFrame containing image metadata.
data_path (Path) – Root directory which the image data should be loaded.
client_num (int) – Client number to load data for.
dataset_type (str) – “train” or “test” to specify dataset type.
transform (Callable | None) – Transformation function to apply to the images. Defaults to None.

Returns:

A TensorDataset containing the processed images and label map.

Return type:

tuple[TensorDataset, dict[int, int]]

create_splits(dataset, seed=None, train_fraction=0.8)[source]¶

Splits the dataset into training and validation sets.

Parameters:

dataset (TensorDataset) – The dataset to split.
seed (int | None, optional) – Seed meant to fix the sampling process associated with splitting. Defaults to None.
train_fraction (float, optional) – Fraction of data to use for training. Defaults to 0.8.

Returns:

Indices associated with the selected datapoints for the train and validation sets

Return type:

tuple[list[int], list[int]]

label_frequency(dataset, original_label_map)[source]¶

Prints the frequency of each label in the dataset.

Parameters:

dataset (TensorDataset | Subset) – The dataset to analyze.
original_label_map (dict[int, int]) – A mapping of the original labels to their new labels.

Return type:

None

load_rxrx1_data(data_path, client_num, batch_size, seed=None, train_val_split=0.8, num_workers=0)[source]¶

Load and split the data into training and validation dataloaders.

Parameters:

data_path (Path) – Path to the full set of data.
client_num (int) – Client number for the data you want to load.
batch_size (int) – batch size for the data loaders.
seed (int | None, optional) – Seed to fix randomness associated with data splitting. Defaults to None.
train_val_split (float, optional) – Percentage of data to put in the training loader. The remainder flow to the validation dataloader. Defaults to 0.8.
num_workers (int, optional) – Number of threads to be used by the dataloaders. Defaults to 0.

Returns:

Train and validation dataloaders and a dictionary holding the size of each dataset.

Return type:

tuple[DataLoader, DataLoader, dict[str, int]]

load_rxrx1_test_data(data_path, client_num, batch_size, num_workers=0)[source]¶

Create a dataloader for the reserved rxrx1 dataset.

Parameters:

data_path (Path) – Path to the test data.
client_num (int) – Client number to be loaded.
batch_size (int) – Batch size for processing of the test scripts.
num_workers (int, optional) – Number of workers associated with the test dataloader. Defaults to 0.

Returns:

Test dataloader, dictionary containing count of the data points in the set.

Return type:

tuple[DataLoader, dict[str, int]]