fl4health.feature_alignment.tab_features_preprocessor module

class TabularFeaturesPreprocessor(tab_feature_encoder)[source]

Bases: object

TabularFeaturesPreprocessor is responsible for constructing the appropriate column transformers based on the information encoded in tab_feature_encoder. These transformers will then be applied to a pandas dataframe.

Each tabular feature, which corresponds to a column in the pandas dataframe, has its own column transformer. A default transformer is initialized for each feature based on its data type, but the user may also manually specify a transformer for this feature.

Parameters:
fill_in_missing_columns(df)[source]

Return a new DataFrame where entire missing columns are filled with values specified in each column’s default fill value.

Return type:

DataFrame

get_default_binary_pipeline()[source]
Return type:

Pipeline

get_default_numeric_pipeline()[source]
Return type:

Pipeline

get_default_one_hot_pipeline(categories)[source]
Return type:

Pipeline

get_default_ordinal_pipeline(categories)[source]
Return type:

Pipeline

get_default_string_pipeline(vocabulary)[source]
Return type:

Pipeline

initialize_default_pipelines(tabular_features, one_hot)[source]

Initialize a default Pipeline for every data column in tabular_features.

Parameters:
  • tabular_features (list[TabularFeature]) – list of tabular

  • columns. (features in the data)

Return type:

dict[str, Pipeline]

preprocess_features(df)[source]
Return type:

tuple[ndarray[Any, dtype[Any]], ndarray[Any, dtype[Any]]]

return_column_transformer(pipelines)[source]
Return type:

ColumnTransformer

set_feature_pipeline(feature_name, pipeline)[source]
Return type:

None