fl4health.model_bases.pca module¶
- class PcaModule(low_rank=False, full_svd=False, rank_estimation=6)[source]¶
Bases:
Module
- __init__(low_rank=False, full_svd=False, rank_estimation=6)[source]¶
PyTorch module for performing Principal Component Analysis.
- Parameters:
low_rank (bool, optional) – Indicates whether the data matrix can be well-approximated
has (by a low-rank singular value decomposition. If the user)
so (good reasons to believe)
efficient (then this parameter can be set to True to allow for more)
False. (this argument is ignored. Defaults to)
full_svd (bool, optional) – Indicates whether full SVD or reduced SVD is performed.
True (If low_rank is set to)
and (then an alternative implementation of SVD will be used)
False.
rank_estimation (int, optional) – A slight overestimation of the rank of the data matrix.
6. (Only used if self.low_rank is True. Defaults to)
Notes –
1. If low_rank is set to True, then a value q for rank_estimation is required (either specified by the user or via its default value). If q is too far away from the actual rank k of the data matrix, then the resulting rank-q svd approximation is not guaranteed to be a good approximation of the data matrix.
2. If low_rank is set to True, then a value q for rank_estimation can be chosen according to the following criteria:
in general, k <= q <= min(2*k, m, n). For large low-rank matrices, take q = k + l, where 5 <= l <= 10. If k is relatively small compared to min(m, n), choosing l = 0, 1, 2 may be sufficient.
3. If low_rank is set to True and rank_estimation is set to q, then the module will utilize a randomized algorithm to compute a rank-q approximation of the data matrix via SVD.
- For more details on this, see:
https://pytorch.org/docs/stable/generated/torch.svd_lowrank.html
and
https://pytorch.org/docs/stable/generated/torch.pca_lowrank.html
As per the official documentation of PyTorch, in general, the user should set low_rank to False. Setting it to True would be useful for huge sparse matrices.
- compute_projection_variance(X, k, center_data=False)[source]¶
Compute the variance of the data matrix X after projection via PCA.
The variance is defined as
X @ U |_F ** 2
- compute_reconstruction_error(X, k, center_data=False)[source]¶
Compute the reconstruction error of X under PCA reconstruction.
More precisely, if X is an N by d data matrix whose rows are the data points, and U is the matrix whose columns are the principal components of X, then the reconstruction loss is defined as
1 / N * | X @ U @ U.T - X| ** 2.
- Parameters:
X (Tensor) – Input data tensor whose rows represent data points.
k (int | None) – The number of principal components onto which projection is applied.
center_data (bool) – Indicates whether to subtract data mean prior to
subspace (projecting the data into a lower-dimensional)
add (and whether to)
back. (the data mean after projecting)
- Returns:
reconstruction loss as defined above.
- Return type:
Note
The reconstruction (after centering) is X @ U @ U.T because this method assumes that the rows of X are the data points while the columns of U are the principal components.
- forward(X, center_data)[source]¶
Perform PCA on the data matrix X by computing its SVD.
- Parameters:
X (Tensor) – Data matrix.
center_data (bool) – If true, then the data mean will be subtracted
false (from all data points prior to performing PCA. If center_data is)
:param : :param it is expected that the data has already been centered and an exception: :param will be thrown if it is not.:
- Returns:
The principal components (i.e., right singular vectors) and their corresponding singular values.
- Return type:
tuple[Tensor, Tensor]
Note: the algorithm assumes that the rows of X are the data points (after reshaping as needed). Consequently, the principal components, which are the eigenvectors of X.T @ X, are the right singular vectors in the SVD of X.
- maybe_reshape(X)[source]¶
Reshape input tensor X as needed so SVD can be computed. Reshaping is required when each data point is an N-dimensional tensor because PCA requires X to be a 2D data matrix.
- Return type:
Tensor
- prepare_data_forward(X, center_data)[source]¶
Prepare input data X for PCA by reshaping and centering it as needed.
- Parameters:
X (Tensor) – Data matrix.
center_data (bool) – If true, then the data mean will be subtracted
false (from all data points prior to performing PCA. If center_data is)
:param : :param it is expected that the data has already been centered and an exception: :param will be thrown if it is not.:
- Returns:
Prepared data matrix.
- Return type:
Tensor
- project_back(X_lower_dim, add_mean=False)[source]¶
Project low-dimensional principal representations back into the original space to recover the reconstruction of data points.
- Parameters:
X_lower_dim (Tensor) – Matrix whose rows are low-dimensional principal representations
data. (of the original)
add_mean (bool, optional) – Indicates whether the training data mean should be
centered (added to the projection result. This can be set to True if the user)
to (the data prior to dimensionality reduction and now wish)
False. (add back the data mean. Defaults to)
- Returns:
Reconstruction of data points.
- Return type:
Tensor
- project_lower_dim(X, k=None, center_data=False)[source]¶
Project input data X onto the top k principal components.
- Parameters:
X (Tensor) – Input data matrix whose rows are the data points.
k (int | None, optional) – The number of principal components
None (onto which projection is done. If k is)
will (then all principal components)
None. (be used in the projection. Defaults to)
center_data (bool) – If true, then the training data mean (learned in the forward pass)
projection. (will be subtracted from all data points prior to)
false (If center_data is)
user. (it is expected that the data has already been centered in this manner by the)
- Returns:
Projection result.
- Return type:
Tensor
Note
The result of projection (after centering) is X @ U because this method assumes that the rows of X are the data points while the columns of U are the principal components.