data_juicer.ops.mapper.extract_support_text_mapper module

class data_juicer.ops.mapper.extract_support_text_mapper.ExtractSupportTextMapper(api_model: str = 'gpt-4o', *, summary_key: str = 'event_description', support_text_key: str = 'support_text', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

基类:Mapper

Extract support sub text for a summary.

DEFAULT_SYSTEM_PROMPT = '你将扮演一个文本摘录助手的角色。你的主要任务是基于给定的文章(称为“原文”)以及对原文某个部分的简短描述或总结(称为“总结”),准确地识别并提取出与该总结相对应的原文片段。\n要求:\n- 你需要尽可能精确地匹配到最符合总结内容的那部分内容\n- 如果存在多个可能的答案,请选择最贴近总结意思的那个\n- 下面是一个例子帮助理解这一过程:\n### 原文:\n《红楼梦》是中国古典小说四大名著之一,由清代作家曹雪芹创作。它讲述了贾宝玉、林黛玉等人的爱情故事及四大家族的兴衰历程。书中通过复杂的人物关系展现了封建社会的各种矛盾冲突。其中关于贾府内部斗争的部分尤其精彩,特别是王熙凤与尤二姐之间的争斗,生动描绘了权力争夺下的女性形象。此外,《红楼梦》还以其精美的诗词闻名,这些诗词不仅增添了文学色彩,也深刻反映了人物的性格特点和命运走向。\n\n### 总结:\n描述了书中的两个女性角色之间围绕权力展开的竞争。\n\n### 原文摘录:\n其中关于贾府内部斗争的部分尤其精彩,特别是王熙凤与尤二姐之间的争斗,生动描绘了权力争夺下的女性形象。'
DEFAULT_INPUT_TEMPLATE = '### 原文:\n{text}\n\n### 总结:\n{summary}\n\n### 原文摘录:\n'
__init__(api_model: str = 'gpt-4o', *, summary_key: str = 'event_description', support_text_key: str = 'support_text', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

Initialization method. :param api_model: API model name. :param summary_key: The key name to store the input summary in the

meta field. It's "event_description" in default.

参数:
  • support_text_key -- The key name to store the output support text for the summary in the meta field. It's "support_text" in default.

  • api_endpoint -- URL endpoint for the API.

  • response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.

  • system_prompt -- System prompt for the task.

  • input_template -- Template for building the model input.

  • try_num -- The number of retry attempts when there is an API call error or output parsing error.

  • drop_text -- If drop the text in the output.

  • model_params -- Parameters for initializing the API model.

  • sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}

  • kwargs -- Extra keyword arguments.

process_single(sample, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample