Framework Transformers
Overview
Framework transformers are a key component of mloda's compute framework system, enabling seamless conversion between different data representations. They allow feature groups to work with multiple compute frameworks by providing bidirectional transformation capabilities.
Core Components
BaseTransformer
The BaseTransformer
is an abstract base class that defines the interface for transforming data between different compute frameworks. It provides the foundation for all framework transformers in the system.
class BaseTransformer:
"""
Abstract base class for transforming data between different compute frameworks.
"""
Key features: - Defines a consistent interface for all transformers - Handles the logic for determining transformation direction - Provides methods for checking if required frameworks are available - Manages the actual transformation process
ComputeFrameworkTransformer
The ComputeFrameworkTransformer
manages the registry of available transformers and provides methods to find the appropriate transformer for a given pair of frameworks.
class ComputeFrameworkTransformer:
"""
Manages transformations between different compute frameworks.
"""
Key features:
- Maintains a registry of available transformers
- Automatically discovers and registers all BaseTransformer
subclasses
- Provides a lookup mechanism to find the appropriate transformer for any framework pair
PandasPyarrowTransformer
The PandasPyarrowTransformer
is a concrete implementation of BaseTransformer
that handles conversions between Pandas DataFrames and PyArrow Tables.
class PandasPyarrowTransformer(BaseTransformer):
"""
Transformer for converting between Pandas DataFrame and PyArrow Table.
"""
Key features: - Converts Pandas DataFrames to PyArrow Tables and vice versa - Handles metadata properly during transformations - Ensures clean conversion by removing framework-specific metadata
DuckDBPyarrowTransformer
The DuckDBPyarrowTransformer
is a concrete implementation of BaseTransformer
that handles conversions between DuckDB Relations and PyArrow Tables.
class DuckDBPyarrowTransformer(BaseTransformer):
"""
Transformer for converting between DuckDB relations and PyArrow Table.
"""
Key features:
- Converts DuckDB Relations to PyArrow Tables and vice versa
- Leverages DuckDB's native PyArrow integration for efficient zero-copy operations
- Requires a DuckDB connection object for PyArrow → DuckDB transformations
- Uses DuckDB's to_arrow_table()
and from_arrow()
methods for optimal performance
Important: The DuckDB transformer requires a connection object when transforming from PyArrow to DuckDB. This is because DuckDB relations must be associated with a specific database connection.
How Framework Transformers Work
Registration Process
- During initialization, the
ComputeFrameworkTransformer
discovers all subclasses ofBaseTransformer
- Each transformer is registered in a mapping from framework pairs to transformer classes
- The mapping is bidirectional, allowing transformations in both directions
Transformation Process
When data needs to be transformed from one framework to another:
- The system identifies the source and target framework types
- It looks up the appropriate transformer in the registry
- It determines the direction of transformation (primary to secondary or vice versa)
- It calls the appropriate transformation method on the transformer
- The transformer converts the data to the target framework format
Example Flow
Pandas DataFrame → PandasPyarrowTransformer → PyArrow Table
Or in the reverse direction:
PyArrow Table → PandasPyarrowTransformer → Pandas DataFrame
For DuckDB transformations:
DuckDB Relation → DuckDBPyarrowTransformer → PyArrow Table
PyArrow Table → DuckDBPyarrowTransformer → DuckDB Relation (requires connection)
Creating Custom Transformers
To create a custom transformer for a new pair of frameworks:
- Subclass
BaseTransformer
- Implement the required methods:
framework()
- Return the primary framework typeother_framework()
- Return the secondary framework typeimport_fw()
- Import the primary framework moduleimport_other_fw()
- Import the secondary framework moduletransform_fw_to_other_fw()
- Transform from primary to secondarytransform_other_fw_to_fw()
- Transform from secondary to primary
Example:
from typing import Any, Optional
class CustomTransformer(BaseTransformer):
@classmethod
def framework(cls) -> Any:
return CustomFramework
@classmethod
def other_framework(cls) -> Any:
return OtherFramework
@classmethod
def import_fw(cls) -> None:
import custom_framework
@classmethod
def import_other_fw(cls) -> None:
import other_framework
@classmethod
def transform_fw_to_other_fw(cls, data: Any) -> Any:
# Convert from CustomFramework to OtherFramework
return other_framework.from_custom(data)
@classmethod
def transform_other_fw_to_fw(cls, data: Any, framework_connection_object: Optional[Any] = None) -> Any:
# Convert from OtherFramework to CustomFramework
return custom_framework.from_other(data)
Integration with ComputeFrameWork
The ComputeFrameWork
class uses the transformer system to convert data between different frameworks when needed. This happens in several scenarios:
- When data needs to be transformed to match the expected framework of a feature group
- When data needs to be uploaded to a flight server (always converted to PyArrow Table)
- When data is retrieved from a flight server and needs to be converted back
The transformation is handled automatically by the apply_compute_framework_transformer
method:
def apply_compute_framework_transformer(self, data: Any) -> Any:
_from_fw = type(data)
_to_fw = self.expected_data_framework()
transformer_cls = self.transformer.transformer_map.get((_from_fw, _to_fw), None)
if transformer_cls is not None:
return transformer_cls.transform(_from_fw, _to_fw, data)
return None
Benefits of Framework Transformers
- Decoupling: Feature groups can be defined independently of specific compute frameworks
- Flexibility: The system can work with multiple data representations
- Extensibility: New frameworks can be added by implementing new transformers
- Transparency: Transformations happen automatically without user intervention
Conclusion
Framework transformers are a critical part of mloda's flexibility, allowing the system to work with multiple compute frameworks and data representations. By providing a clean interface for data conversion, they enable feature groups to be defined once and used with different technologies, supporting use cases like online/offline computation, testing, and migrations between environments.