data_juicer.ops.mapper.python_file_mapper module

class data_juicer.ops.mapper.python_file_mapper.PythonFileMapper(file_path: str = '', function_name: str = 'process_single', batched: bool = False, **kwargs)[source]

Bases: Mapper

Mapper for executing Python function defined in a file.

__init__(file_path: str = '', function_name: str = 'process_single', batched: bool = False, **kwargs)[source]

Initialization method.

Parameters:
  • file_path – The path to the Python file containing the function to be executed.

  • function_name – The name of the function defined in the file to be executed.

  • batched – A boolean indicating whether to process input data in batches.

  • kwargs – Additional keyword arguments passed to the parent class.

process_single(sample)[source]

Invoke the loaded function with the provided sample.

process_batched(samples)[source]

Invoke the loaded function with the provided samples.