fl4health.feature_alignment.string_columns_transformer module

class TextColumnTransformer(transformer)[source]

Bases: BaseEstimator, TransformerMixin

__init__(transformer)[source]

The purpose of this class is to enable the application of text feature transformers from sklearn to a single-column pandas dataframe, which is not supported in the first place.

Parameters:

transformer (TextFeatureTransformer) – Transformer to be applied

fit(X, y=None)[source]

Fit the transformer to the provided dataframe. The dataframe should have a single string column The transformer is fit on the text from the single columns in the X dataframe

Parameters:
  • X (pd.DataFrame) – Column on which to fit the transformer

  • y (pd.DataFrame | None, optional) – Not used. Defaults to None.

Returns:

The fit transformer

Return type:

TextColumnTransformer

transform(X)[source]

Transforms the concatenation of a single column of text in the X dataframe

Parameters:

X (pd.DataFrame) – Dataframe of text-based column to be transformed

Returns:

Transformed dataframe.

Return type:

pd.DataFrame

class TextMulticolumnTransformer(transformer)[source]

Bases: BaseEstimator, TransformerMixin

__init__(transformer)[source]

The purpose of this class is to enable the application of text feature transformers from sklearn to multiple string columns, which is not supported in the first place.

Parameters:

transformer (TextFeatureTransformer) – Transformer to be applied

fit(X, y=None)[source]

Fit the transformer to the provided dataframe. The dataframe should have multiple string columns The transformer is fit on the appended text from all columns in the X dataframe

Parameters:
  • X (pd.DataFrame) – Columns on which to fit the transformer

  • y (pd.DataFrame | None, optional) – Not used. Defaults to None.

Returns:

The fit transformer

Return type:

TextMulticolumnTransformer

transform(X)[source]

Transforms the concatenation of all columns of text in the X dataframe

Parameters:

X (pd.DataFrame) – Dataframe of text-based columns to be transformed

Returns:

Transformed dataframe.

Return type:

pd.DataFrame