data_juicer.ops.mapper.extract_support_text_mapper module¶
- class data_juicer.ops.mapper.extract_support_text_mapper.ExtractSupportTextMapper(api_model: str = 'gpt-4o', *, summary_key: str = 'event_description', support_text_key: str = 'support_text', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶
Bases:
Mapper
Extract support sub text for a summary.
- DEFAULT_SYSTEM_PROMPT = '你将扮演一个文本摘录助手的角色。你的主要任务是基于给定的文章(称为“原文”)以及对原文某个部分的简短描述或总结(称为“总结”),准确地识别并提取出与该总结相对应的原文片段。\n要求:\n- 你需要尽可能精确地匹配到最符合总结内容的那部分内容\n- 如果存在多个可能的答案,请选择最贴近总结意思的那个\n- 下面是一个例子帮助理解这一过程:\n### 原文:\n《红楼梦》是中国古典小说四大名著之一,由清代作家曹雪芹创作。它讲述了贾宝玉、林黛玉等人的爱情故事及四大家族的兴衰历程。书中通过复杂的人物关系展现了封建社会的各种矛盾冲突。其中关于贾府内部斗争的部分尤其精彩,特别是王熙凤与尤二姐之间的争斗,生动描绘了权力争夺下的女性形象。此外,《红楼梦》还以其精美的诗词闻名,这些诗词不仅增添了文学色彩,也深刻反映了人物的性格特点和命运走向。\n\n### 总结:\n描述了书中的两个女性角色之间围绕权力展开的竞争。\n\n### 原文摘录:\n其中关于贾府内部斗争的部分尤其精彩,特别是王熙凤与尤二姐之间的争斗,生动描绘了权力争夺下的女性形象。'¶
- DEFAULT_INPUT_TEMPLATE = '### 原文:\n{text}\n\n### 总结:\n{summary}\n\n### 原文摘录:\n'¶
- __init__(api_model: str = 'gpt-4o', *, summary_key: str = 'event_description', support_text_key: str = 'support_text', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶
Initialization method. :param api_model: API model name. :param summary_key: The key name to store the input summary in the
meta field. It’s “event_description” in default.
- Parameters:
support_text_key – The key name to store the output support text for the summary in the meta field. It’s “support_text” in default.
api_endpoint – URL endpoint for the API.
response_path – Path to extract content from the API response. Defaults to ‘choices.0.message.content’.
system_prompt – System prompt for the task.
input_template – Template for building the model input.
try_num – The number of retry attempts when there is an API call error or output parsing error.
drop_text – If drop the text in the output.
model_params – Parameters for initializing the API model.
sampling_params – Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95}
kwargs – Extra keyword arguments.