fl4health.feature_alignment.handle_types module

Largely taken from https://github.com/VectorInstitute/cyclops.

convertible_to_type(series, type, unique=None, raise_error=False)[source]

Check whether a feature can be converted to some type.

Parameters:
  • series (pd.Series) – Feature data.

  • type (FeatureType) – Feature type name to check for conversion.

  • unique (np.ndarray | None, optional) – _description_. Defaults to None.

  • raise_error (bool, optional) – Unique values which can be optionally specified. Defaults to False.

Raises:
  • ValueError – Supported type has no corresponding datatype

  • ValueError – Cannot convert series to the provided type and raise_error is true.

Returns:

Whether the feature can be converted.

Return type:

bool

get_unique(values, unique=None)[source]

Get the unique values of pandas series.

The utility of this function comes from checking whether the unique values have already been calculated. This function assumes that if the unique values are passed, they are correct.

Parameters:
  • values (np.ndarray | pd.Series) – Values for which to get the unique values.

  • unique (np.ndarray | None, optional) – Unique values which can be optionally specified. Defaults to None.

Returns:

The unique values.

Return type:

np.ndarray

infer_types(data, features)[source]

Infer intended feature types and perform the relevant conversions.

Parameters:
  • data (pd.DataFrame) – Feature data.

  • features (list[str]) – Features to consider.

Returns:

A tuple (pandas.DataFrame, dict) with the updated features data and metadata respectively.

Return type:

dict[str, str]

to_dtype(series, type)[source]

Set the series datatype according to the feature type.

Parameters:
  • series (pd.Series) – Feature data.

  • type (FeatureType) – Feature type name.

Returns:

The feature with the corresponding datatype.

Return type:

pd.Series

to_types(data, new_types)[source]

Convert features to given types.

Parameters:
  • data (pd.DataFrame) – Features data.

  • new_types (dict[str, str]) – Map from the feature column name to its new type.

Returns:

Tuple (pandas.DataFrame, dict) with the updated features data and metadata respectively.

Return type:

tuple[pd.DataFrame, dict[str, Any]]

valid_feature_type(type, raise_error=True)[source]

Check whether a feature type name is valid.

Parameters:
  • type (FeatureType) – Feature type name.

  • raise_error (bool, optional) – Whether to raise an error is the type is invalid. Defaults to True.

Raises:

ValueError – Raise when the type is invalid and raise_error is True

Returns:

Whether the type is valid.

Return type:

bool