fl4health.feature_alignment.feature_type_extraction module

Largely taken from https://github.com/VectorInstitute/cyclops.

class FeatureMeta(**kwargs)[source]

Bases: object

__init__(**kwargs)[source]

Feature metadata class.

get_type()[source]

Get the feature type.

Returns:

Feature type.

Return type:

str

update(meta)[source]

Update meta attributes.

Parameters:

meta (list[tuple[str, Any]]) – List of tuples in the format (attribute name, attribute value).

Return type:

None

class Features(data, features, by=None, targets=None, force_types=None)[source]

Bases: object

__init__(data, features, by=None, targets=None, force_types=None)[source]

Features.

Parameters:
  • data (pd.DataFrame) – Features data.

  • features (str | list[str]) – List of feature columns. The remaining columns are treated as metadata.

  • by (str | list[str] | None, optional) – Columns to groupby during processing, affecting how the features are treated. Defaults to None.

  • targets (str | list[str] | None, optional) – Column names to specify as target features. Defaults to None.

  • force_types (dict[str, FeatureType] | None, optional) – Mapping of column names to type. These columns are forced to be of the specified type. Defaults to None.

property types: dict[str, FeatureType]

Access as attribute, feature type names.

NOTE: These are framework-specific feature names.

Returns:

Feature type mapped for each feature.

Return type:

dict[str, str]

class TabularFeatures(data, features, by, targets=None, force_types=None)[source]

Bases: Features

__init__(data, features, by, targets=None, force_types=None)[source]

Tabular features.

Parameters:
  • data (pd.DataFrame) – Data for the table

  • features (str | list[str]) – List of feature columns. The remaining columns are treated as metadata.

  • by (str) – Columns to groupby during processing, affecting how the features are treated.

  • targets (str | list[str] | None, optional) – Column names to specify as target features. Defaults to None.

  • force_types (dict[str, FeatureType] | None, optional) – Mapping of column names to type. These columns are forced to be of the specified type. Defaults to None.

Raises:

ValueError – Tabular features index input as a string representing a column

has_columns(data, cols, exactly=False, raise_error=False)[source]

Check if data has required columns for processing.

Parameters:
  • data (pd.DataFrame) – DataFrame to check.

  • cols (str | list[str]) – List of column names that must be present in data.

  • exactly (bool, optional) – Whether columns need to be an exact match. Defaults to False.

  • raise_error (bool, optional) – Whether to raise a ValueError if there are missing columns. Defaults to False.

Raises:
  • ValueError – Missing required columns.

  • ValueError – Must have exactly the columns, will throw if not and exactly is True.

Returns:

True if all required columns are present, otherwise False.

Return type:

bool

to_list(obj)[source]

Convert some object to a list of object(s) unless already one.

Parameters:

obj (Any) – The object to convert to a list.

Returns:

The processed object.

Return type:

List[Any]