Feature Groups

Creating a Custom Feature Group

In this example, we’ll create a custom feature group that multiplies the results of each feature by 2. We'll implement a new feature group and then use it within mloda.

1. Import the Required Modules and Set File References

Start by importing the necessary modules to define the custom feature group and perform calculations:

import pyarrow.compute as pc
import pyarrow as pa

from mloda_core.abstract_plugins.abstract_feature_group import AbstractFeatureGroup
from mloda_core.abstract_plugins.components.data_access_collection import DataAccessCollection

file_path = "tests/test_plugins/feature_group/src/dataset/creditcard_2023_short.csv"
data_access_collection = DataAccessCollection(files={file_path})

feature_list = ["id","V1","V2","V3"]
example_feature_list = [f"Example_{f}" for f in feature_list]

2. Define the Feature Group

The custom feature group, Example, operates on a set of input features. It depends on the root features (e.g., "id", "V1", etc.) and renames them with the prefix "Example_".

The calculation logic for multiplying each feature by 2 is implemented in the calculate_feature function.

class Example(AbstractFeatureGroup):
    def input_features(self, _, feature_name):
        return {feature_name.name.split("_")[1]}

    @classmethod
    def calculate_feature(cls, data, _):
        multiplied_columns = [pc.multiply(data[column], 2) for column in data.column_names]
        col_names = [f"{cls.get_class_name()}_{col_names}" for col_names in data.column_names]
        multiplied_table = pa.table(multiplied_columns, names=col_names)
        return multiplied_table

3. Execute the Request Using the New Feature Group

To use the newly defined feature group, simply add the "Example_" prefix to each feature name. mloda will automatically resolve the dependency between the ReadCsvFeatureGroup and the ExampleFeatureGroup.

from mloda_core.api.request import mlodaAPI

result = mlodaAPI.run_all(
            example_feature_list, 
            compute_frameworks=["PyarrowTable"], 
            data_access_collection=data_access_collection
        )
result[0]

Expected output:

pyarrow.Table
Example_V28: double
Example_id: int64
...
Example_V28: [[-0.26000604758867731,-0.26311827417649086,-0....]]
Example_id: [[0,2,4,...]]
....

4. Summary

In this example, we implemented a custom feature group, Example, that multiplies each feature value by 2. By defining a straightforward input_features method and a calculate_feature method, we were able to extend mloda's feature engineering capabilities with custom transformations. We then executed the request by simply modifying the feature names with a prefix ("Example_"), allowing mloda to handle dependencies and computations automatically.

5. Advanced Feature Group Topics

For more in-depth information about feature groups, check out these advanced topics:

Feature Chain Parser - How feature groups work with chained feature names
Feature Group Matching - How the system determines which feature group handles a feature
Feature Group Testing - Best practices for testing feature groups
Feature Group Versioning - How versioning works in feature groups
Compute Framework Integration - How feature groups integrate with compute frameworks
Multiple Result Columns - How feature groups can return multiple related columns