data_juicer.ops.mapper.optimize_qa_mapper module¶
- class data_juicer.ops.mapper.optimize_qa_mapper.OptimizeQAMapper(hf_model: str = 'Qwen/Qwen2.5-7B-Instruct', *, system_prompt: str | None = None, input_template: str | None = None, qa_pair_template: str | None = None, output_pattern: str | None = None, enable_vllm: bool = False, model_params: Dict | None = None, sampling_params: Dict | None = None, **kwargs)[源代码]¶
基类:
Mapper
Mapper to optimize question-answer pairs.
- DEFAULT_SYSTEM_PROMPT = '请优化输入的问答对,使【问题】和【回答】都更加详细、准确。必须按照以下标记格式,直接输出优化后的问答对:\n【问题】\n优化后的问题\n【回答】\n优化后的回答'¶
- DEFAULT_INPUT_TEMPLATE = '以下是原始问答对:\n{}'¶
- DEFAULT_QA_PAIR_TEMPLATE = '【问题】\n{}\n【回答】\n{}'¶
- DEFAULT_OUTPUT_PATTERN = '.*?【问题】\\s*(.*?)\\s*【回答】\\s*(.*)'¶
- __init__(hf_model: str = 'Qwen/Qwen2.5-7B-Instruct', *, system_prompt: str | None = None, input_template: str | None = None, qa_pair_template: str | None = None, output_pattern: str | None = None, enable_vllm: bool = False, model_params: Dict | None = None, sampling_params: Dict | None = None, **kwargs)[源代码]¶
Initialization method.
- 参数:
hf_model -- Hugging Face model ID.
system_prompt -- System prompt for guiding the optimization task.
input_template -- Template for building the input for the model. Please make sure the template contains one placeholder '{}', which corresponds to the question and answer pair generated by param qa_pair_template.
qa_pair_template -- Template for formatting the question and answer pair. Please make sure the template contains two '{}' to format question and answer.
output_pattern -- Regular expression pattern to extract question and answer from model response.
enable_vllm -- Whether to use VLLM for inference acceleration.
model_params -- Parameters for initializing the model.
sampling_params -- Sampling parameters for text generation (e.g., {'temperature': 0.9, 'top_p': 0.95}).
kwargs -- Extra keyword arguments.