data_juicer.ops.aggregator.entity_attribute_aggregator module¶
- class data_juicer.ops.aggregator.entity_attribute_aggregator.EntityAttributeAggregator(api_model: str = 'gpt-4o', entity: str = None, attribute: str = None, input_key: str = 'event_description', output_key: str = 'entity_attribute', word_limit: Annotated[int, Gt(gt=0)] = 100, max_token_num: Annotated[int, Gt(gt=0)] | None = None, *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, example_prompt: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]¶
基类:
Aggregator
Return conclusion of the given entity's attribute from some docs.
- DEFAULT_SYSTEM_TEMPLATE = '给定与`{entity}`相关的一些文档,总结`{entity}`的`{attribute}`。\n要求:\n- 尽量使用原文专有名词\n- 联系上下文,自动忽略上下文不一致的细节错误\n- 只对文档中与`{entity}`的`{attribute}`有关的内容进行总结\n- 字数限制在**{word_limit}字以内**\n- 要求输出格式如下:\n# {entity}\n## {attribute}\n...\n{example}'¶
- DEFAULT_EXAMPLE_PROMPT = '- 例如,根据相关文档总结`孙悟空`的`出身背景`,**100字**以内的样例如下:\n`孙悟空`的`出身背景`总结:\n# 孙悟空\n## 出身背景\n号称齐天大圣,花果山水帘洞的美猴王、西行取经队伍中的大师兄。师父是唐僧玄奘,曾拜菩提祖师学艺。亲生父母未知,自石头中孕育而生。自认斗战胜佛,最怕观世音菩萨和紧箍咒。\n'¶
- DEFAULT_INPUT_TEMPLATE = '`{entity}`的相关文档:\n{sub_docs}\n\n`{entity}`的`{attribute}`总结:\n'¶
- DEFAULT_OUTPUT_PATTERN_TEMPLATE = '\\#\\s*{entity}\\s*\\#\\#\\s*{attribute}\\s*(.*?)\\Z'¶
- __init__(api_model: str = 'gpt-4o', entity: str = None, attribute: str = None, input_key: str = 'event_description', output_key: str = 'entity_attribute', word_limit: Annotated[int, Gt(gt=0)] = 100, max_token_num: Annotated[int, Gt(gt=0)] | None = None, *, api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, example_prompt: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]¶
Initialization method. :param api_model: API model name. :param entity: The given entity. :param attribute: The given attribute. :param input_key: The input key in the meta field of the samples.
It is "event_description" in default.
- 参数:
output_key -- The output key in the aggregation field of the samples. It is "entity_attribute" in default.
word_limit -- Prompt the output length.
max_token_num -- The max token num of the total tokens of the sub documents. Without limitation if it is None.
api_endpoint -- URL endpoint for the API.
response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.
system_prompt_template -- The system prompt template.
example_prompt -- The example part in the system prompt.
input_template -- The input template.
output_pattern_template -- The output template.
try_num -- The number of retry attempts when there is an API call error or output parsing error.
model_params -- Parameters for initializing the API model.
sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}
kwargs -- Extra keyword arguments.