data_juicer.ops.mapper.pair_preference_mapper module¶
- class data_juicer.ops.mapper.pair_preference_mapper.PairPreferenceMapper(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, output_pattern: str | None = None, rejected_key: str = 'rejected_response', reason_key: str = 'reason', try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶
Bases:
Mapper
Mapper to construct paired preference samples.
- DEFAULT_SYSTEM_PROMPT = '你的任务是根据参考信息修改问答对中的回答,在语言风格、事实性、人物身份、立场等任一方面与原回答相反。必须按照以下标记格式输出,不要输出其他多余内容。\n【回答】\n生成的新回答\n【原因】\n生成该回答的原因'¶
- DEFAULT_INPUT_TEMPLATE = '【参考信息】\n{reference}\n\n以下是原始问答对:\n【问题】\n{query}\n【回答】\n{response}'¶
- DEFAULT_OUTPUT_PATTERN = '.*?【回答】\\s*(.*?)\\s*【原因】\\s*(.*)'¶
- __init__(api_model: str = 'gpt-4o', *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, output_pattern: str | None = None, rejected_key: str = 'rejected_response', reason_key: str = 'reason', try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶
Initialization method.
- Parameters:
api_model – API model name.
api_endpoint – URL endpoint for the API.
response_path – Path to extract content from the API response. Defaults to ‘choices.0.message.content’.
system_prompt – System prompt for guiding the generation task.
input_template – Template for building the model input. It must contain placeholders ‘{query}’ and ‘{response}’, and can optionally include ‘{reference}’.
output_pattern – Regular expression for parsing model output.
rejected_key – The field name in the sample to store the generated rejected response. Defaults to ‘rejected_response’.
reason_key – The field name in the sample to store the reason for generating the response. Defaults to ‘reason’.
try_num – The number of retries for the API call in case of response parsing failure. Defaults to 3.
model_params – Parameters for initializing the API model.
sampling_params – Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95}
kwargs – Extra keyword arguments.