fl4health.feature_alignment.feature_type_extraction module¶
Largely taken from https://github.com/VectorInstitute/cyclops.
- class Features(data, features, by=None, targets=None, force_types=None)[source]¶
Bases:
object- __init__(data, features, by=None, targets=None, force_types=None)[source]¶
Features.
- Parameters:
data (pd.DataFrame) – Features data.
features (str | list[str]) – List of feature columns. The remaining columns are treated as metadata.
by (str | list[str] | None, optional) – Columns to groupby during processing, affecting how the features are treated. Defaults to None.
targets (str | list[str] | None, optional) – Column names to specify as target features. Defaults to None.
force_types (dict[str, FeatureType] | None, optional) – Mapping of column names to type. These columns are forced to be of the specified type. Defaults to None.
- class TabularFeatures(data, features, by, targets=None, force_types=None)[source]¶
Bases:
Features- __init__(data, features, by, targets=None, force_types=None)[source]¶
Tabular features.
- Parameters:
data (pd.DataFrame) – Data for the table
features (str | list[str]) – List of feature columns. The remaining columns are treated as metadata.
by (str) – Columns to groupby during processing, affecting how the features are treated.
targets (str | list[str] | None, optional) – Column names to specify as target features. Defaults to None.
force_types (dict[str, FeatureType] | None, optional) – Mapping of column names to type. These columns are forced to be of the specified type. Defaults to None.
- Raises:
ValueError – Tabular features index input as a string representing a column
- has_columns(data, cols, exactly=False, raise_error=False)[source]¶
Check if data has required columns for processing.
- Parameters:
data (pd.DataFrame) – DataFrame to check.
cols (str | list[str]) – List of column names that must be present in data.
exactly (bool, optional) – Whether columns need to be an exact match. Defaults to False.
raise_error (bool, optional) – Whether to raise a ValueError if there are missing columns. Defaults to False.
- Raises:
ValueError – Missing required columns.
ValueError – Must have exactly the columns, will throw if not and exactly is True.
- Returns:
True if all required columns are present, otherwise False.
- Return type: