data_juicer.ops.mapper.python_file_mapper module¶
- class data_juicer.ops.mapper.python_file_mapper.PythonFileMapper(file_path: str = '', function_name: str = 'process_single', batched: bool = False, **kwargs)[源代码]¶
基类:
Mapper
Executes a Python function defined in a file on input data.
This operator loads a specified Python function from a given file and applies it to the input data. The function must take exactly one argument and return a dictionary. The operator can process data either sample by sample or in batches, depending on the batched parameter. If the file path is not provided, the operator acts as an identity function, returning the input sample unchanged. The function is loaded dynamically, and its name and file path are configurable. Important notes: - The file must be a valid Python file (.py). - The function must be callable and accept exactly one argument. - The function's return value must be a dictionary.
- __init__(file_path: str = '', function_name: str = 'process_single', batched: bool = False, **kwargs)[源代码]¶
Initialization method.
- 参数:
file_path -- The path to the Python file containing the function to be executed.
function_name -- The name of the function defined in the file to be executed.
batched -- A boolean indicating whether to process input data in batches.
kwargs -- Additional keyword arguments passed to the parent class.