data_juicer.ops.mapper.calibrate_qa_mapper module

class data_juicer.ops.mapper.calibrate_qa_mapper.CalibrateQAMapper(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, reference_template: str | None = None, qa_pair_template: str | None = None, output_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]

Bases: Mapper

Mapper to calibrate question-answer pairs based on reference text.

DEFAULT_SYSTEM_PROMPT = '请根据提供的【参考信息】对【问题】和【回答】进行校准,使其更加详细、准确。\n按照以下格式输出:\n【问题】\n校准后的问题\n【回答】\n校准后的回答'
DEFAULT_INPUT_TEMPLATE = '{reference}\n{qa_pair}'
DEFAULT_REFERENCE_TEMPLATE = '【参考信息】\n{}'
DEFAULT_QA_PAIR_TEMPLATE = '【问题】\n{}\n【回答】\n{}'
DEFAULT_OUTPUT_PATTERN = '【问题】\\s*(.*?)\\s*【回答】\\s*(.*)'
__init__(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, reference_template: str | None = None, qa_pair_template: str | None = None, output_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]

Initialization method.

Parameters:
  • api_model – API model name.

  • api_endpoint – URL endpoint for the API.

  • response_path – Path to extract content from the API response. Defaults to ‘choices.0.message.content’.

  • system_prompt – System prompt for the calibration task.

  • input_template – Template for building the model input.

  • reference_template – Template for formatting the reference text.

  • qa_pair_template – Template for formatting question-answer pairs.

  • output_pattern – Regular expression for parsing model output.

  • try_num – The number of retry attempts when there is an API call error or output parsing error.

  • model_params – Parameters for initializing the API model.

  • sampling_params – Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95}

  • kwargs – Extra keyword arguments.

build_input(sample)[source]
parse_output(raw_output)[source]
process_single(sample, rank=None)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample