fl4health.model_bases.pca module

class PcaModule(low_rank=False, full_svd=False, rank_estimation=6)[source]

Bases: Module

__init__(low_rank=False, full_svd=False, rank_estimation=6)[source]

PyTorch module for performing Principal Component Analysis.

Parameters:
  • low_rank (bool, optional) – Indicates whether the data matrix can be well-approximated

  • has (by a low-rank singular value decomposition. If the user)

  • so (good reasons to believe)

  • efficient (then this parameter can be set to True to allow for more)

  • False. (this argument is ignored. Defaults to)

  • full_svd (bool, optional) – Indicates whether full SVD or reduced SVD is performed.

  • True (If low_rank is set to)

  • and (then an alternative implementation of SVD will be used)

  • False.

  • rank_estimation (int, optional) – A slight overestimation of the rank of the data matrix.

  • 6. (Only used if self.low_rank is True. Defaults to)

  • Notes

    1. If low_rank is set to True, then a value q for rank_estimation is required (either specified by the user or via its default value). If q is too far away from the actual rank k of the data matrix, then the resulting rank-q svd approximation is not guaranteed to be a good approximation of the data matrix.

    2. If low_rank is set to True, then a value q for rank_estimation can be chosen according to the following criteria:

    in general, k <= q <= min(2*k, m, n). For large low-rank matrices, take q = k + l, where 5 <= l <= 10. If k is relatively small compared to min(m, n), choosing l = 0, 1, 2 may be sufficient.

    3. If low_rank is set to True and rank_estimation is set to q, then the module will utilize a randomized algorithm to compute a rank-q approximation of the data matrix via SVD.

    For more details on this, see:

    https://pytorch.org/docs/stable/generated/torch.svd_lowrank.html

    and

    https://pytorch.org/docs/stable/generated/torch.pca_lowrank.html

    As per the official documentation of PyTorch, in general, the user should set low_rank to False. Setting it to True would be useful for huge sparse matrices.

center_data(X)[source]
Return type:

Tensor

compute_cumulative_explained_variance()[source]
Return type:

float

compute_explained_variance_ratios()[source]
Return type:

Tensor

compute_projection_variance(X, k, center_data=False)[source]

Compute the variance of the data matrix X after projection via PCA.

The variance is defined as

X @ U |_F ** 2
Parameters:
  • X (Tensor) – input data tensor whose rows represent data points.

  • k (int | None) – the number of principal components onto which projection is applied.

Returns:

variance after projection as defined above.

Return type:

float

compute_reconstruction_error(X, k, center_data=False)[source]

Compute the reconstruction error of X under PCA reconstruction.

More precisely, if X is an N by d data matrix whose rows are the data points, and U is the matrix whose columns are the principal components of X, then the reconstruction loss is defined as

1 / N * | X @ U @ U.T - X| ** 2.

Parameters:
  • X (Tensor) – Input data tensor whose rows represent data points.

  • k (int | None) – The number of principal components onto which projection is applied.

  • center_data (bool) – Indicates whether to subtract data mean prior to

  • subspace (projecting the data into a lower-dimensional)

  • add (and whether to)

  • back. (the data mean after projecting)

Returns:

reconstruction loss as defined above.

Return type:

float

Note

The reconstruction (after centering) is X @ U @ U.T because this method assumes that the rows of X are the data points while the columns of U are the principal components.

forward(X, center_data)[source]

Perform PCA on the data matrix X by computing its SVD.

Parameters:
  • X (Tensor) – Data matrix.

  • center_data (bool) – If true, then the data mean will be subtracted

  • false (from all data points prior to performing PCA. If center_data is)

:param : :param it is expected that the data has already been centered and an exception: :param will be thrown if it is not.:

Returns:

The principal components (i.e., right singular vectors) and their corresponding singular values.

Return type:

tuple[Tensor, Tensor]

Note: the algorithm assumes that the rows of X are the data points (after reshaping as needed). Consequently, the principal components, which are the eigenvectors of X.T @ X, are the right singular vectors in the SVD of X.

maybe_reshape(X)[source]

Reshape input tensor X as needed so SVD can be computed. Reshaping is required when each data point is an N-dimensional tensor because PCA requires X to be a 2D data matrix.

Return type:

Tensor

prepare_data_forward(X, center_data)[source]

Prepare input data X for PCA by reshaping and centering it as needed.

Parameters:
  • X (Tensor) – Data matrix.

  • center_data (bool) – If true, then the data mean will be subtracted

  • false (from all data points prior to performing PCA. If center_data is)

:param : :param it is expected that the data has already been centered and an exception: :param will be thrown if it is not.:

Returns:

Prepared data matrix.

Return type:

Tensor

project_back(X_lower_dim, add_mean=False)[source]

Project low-dimensional principal representations back into the original space to recover the reconstruction of data points.

Parameters:
  • X_lower_dim (Tensor) – Matrix whose rows are low-dimensional principal representations

  • data. (of the original)

  • add_mean (bool, optional) – Indicates whether the training data mean should be

  • centered (added to the projection result. This can be set to True if the user)

  • to (the data prior to dimensionality reduction and now wish)

  • False. (add back the data mean. Defaults to)

Returns:

Reconstruction of data points.

Return type:

Tensor

project_lower_dim(X, k=None, center_data=False)[source]

Project input data X onto the top k principal components.

Parameters:
  • X (Tensor) – Input data matrix whose rows are the data points.

  • k (int | None, optional) – The number of principal components

  • None (onto which projection is done. If k is)

  • will (then all principal components)

  • None. (be used in the projection. Defaults to)

  • center_data (bool) – If true, then the training data mean (learned in the forward pass)

  • projection. (will be subtracted from all data points prior to)

  • false (If center_data is)

  • user. (it is expected that the data has already been centered in this manner by the)

Returns:

Projection result.

Return type:

Tensor

Note

The result of projection (after centering) is X @ U because this method assumes that the rows of X are the data points while the columns of U are the principal components.

set_data_mean(X)[source]

The primary purpose of this method is to store the mean of the training data so it can be used to center validation/test data later, if needed.

Return type:

None

set_principal_components(principal_components, singular_values)[source]
Return type:

None