entity_attribute_aggregator¶
Summarizes a given attribute of an entity from a set of documents.
The operator extracts and summarizes the specified attribute of a given entity from the provided documents. It uses a system prompt, example prompt, and input template to generate the summary. The output is formatted as a markdown-style summary with the entity and attribute clearly labeled. The summary is limited to a specified number of words (default is 100). The operator uses a Hugging Face tokenizer to handle token limits and splits documents if necessary. If the input key or required fields are missing, the operator logs a warning and returns the sample unchanged. The summary is stored in the batch metadata under the specified output key. The system prompt, input template, example prompt, and output pattern can be customized.
从一组文档中提取并总结给定实体的特定属性。
该算子从提供的文档中提取并总结给定实体的指定属性。它使用系统提示、示例提示和输入模板生成摘要。输出格式为 markdown 风格的摘要,其中实体和属性清晰标注。摘要限制在指定的单词数内(默认为 100 个单词)。该算子使用 Hugging Face 分词器来处理 token 限制,并在必要时分割文档。如果缺少输入键或必填字段,该算子会记录警告并返回未更改的样本。摘要存储在批处理元数据中的指定输出键下。系统提示、输入模板、示例提示和输出模式可以自定义。
Type 算子类型: aggregator
Tags 标签: cpu, api
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
<class ‘str’> |
|
API model name. |
|
<class ‘str’> |
|
The given entity. |
|
<class ‘str’> |
|
The given attribute. |
|
<class ‘str’> |
|
The input key in the meta field of the samples. It is “event_description” in default. |
|
<class ‘str’> |
|
The output key in the aggregation field of the samples. It is “entity_attribute” in default. |
|
typing.Annotated[int, Gt(gt=0)] |
|
Prompt the output length. |
|
typing.Optional[typing.Annotated[int, Gt(gt=0)]] |
|
The max token num of the total tokens of the sub documents. Without limitation if it is None. |
|
typing.Optional[str] |
|
URL endpoint for the API. |
|
typing.Optional[str] |
|
Path to extract content from the API response. Defaults to ‘choices.0.message.content’. |
|
typing.Optional[str] |
|
The system prompt template. |
|
typing.Optional[str] |
|
The example part in the system prompt. |
|
typing.Optional[str] |
|
The input template. |
|
typing.Optional[str] |
|
The output template. |
|
typing.Annotated[int, Gt(gt=0)] |
|
The number of retry attempts when there is an API call error or output parsing error. |
|
typing.Dict |
|
Parameters for initializing the API model. |
|
typing.Dict |
|
Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95} |
|
|
Extra keyword arguments. |
📊 Effect demonstration 效果演示¶
test_default_aggregator¶
EntityAttributeAggregator(api_model='qwen2.5-72b-instruct', entity='李莲花', attribute='主要经历')
📥 input data 输入数据¶
📤 output data 输出数据¶
✨ explanation 解释¶
This example demonstrates the default behavior of the operator, which summarizes the main experiences of the character ‘李莲花’ from a set of documents. The output is a summary that includes key events in the life of ‘李莲花’, such as his early achievements and later life as a doctor. The summary is stored under the ‘entity_attribute’ key in the batch metadata. 此示例展示了算子的默认行为,从一组文档中总结角色’李莲花’的主要经历。输出是一个摘要,包括’李莲花’生活中的关键事件,如他早期的成就和后来作为医生的生活。该摘要存储在批次元数据的’entity_attribute’键下。
test_word_limit_num¶
EntityAttributeAggregator(api_model='qwen2.5-72b-instruct', entity='李莲花', attribute='身份背景', word_limit=20)
📥 input data 输入数据¶
📤 output data 输出数据¶
✨ explanation 解释¶
This example shows how to limit the number of words in the summary. The operator generates a concise summary of the character ‘李莲花’ with a word limit of 20. The output is a brief summary that captures the essence of ‘李莲花’’s background and experiences. The summary is stored under the ‘entity_attribute’ key in the batch metadata. 此示例展示了如何限制摘要中的字数。算子生成一个关于角色’李莲花’的简短摘要,字数限制为20。输出是一个简洁的摘要,概括了’李莲花’的背景和经历。该摘要存储在批次元数据的’entity_attribute’键下。