fl4health.preprocessing.pca_preprocessor module

class PcaPreprocessor(checkpointing_path)[source]

Bases: object

__init__(checkpointing_path)[source]

Class that leverages pre-computed principal components of a dataset to perform data-preprocessing.

Parameters:

checkpointing_path (Path) – Path to saved principal components.

load_pca_module()[source]
Return type:

PcaModule

reduce_dimension(new_dimension, dataset)[source]

Perform dimensionality reduction on a dataset by projecting the data onto a set of pre-computed principal components.

(Note that PyTorch dataloaders perform lazy application of transforms. So in reality, dimensionality reduction is applied in real-time as the user iterates through the dataloader created from the dataset returned here.)

Parameters:
  • new_dimension (int) – New data dimension after dimensionality reduction. Equals

  • performed. (the number of principal components onto which projection is)

  • dataset (BaseDataset) – Dataset containing data whose dimension is to be reduced.

Returns:

Dataset consisting of data with reduced dimension.

Return type:

BaseDataset