most_relevant_entities_aggregator¶
Extracts and ranks entities closely related to a given entity from provided texts.
The operator uses a language model API to identify and rank entities, filtering out entities of the same type as the given entity. The ranked list is sorted in descending order of importance. Input texts are aggregated and passed to the model, with an optional token limit. The output is parsed using a regular expression to extract the relevant entities. Results are stored in the batch metadata under the key 'most_relevant_entities'. The operator retries the API call up to a specified number of times in case of errors. The system prompt, input template, and output pattern can be customized.
从提供的文本中提取并排序与给定实体密切相关的实体。
该算子使用语言模型 API 来识别和排序实体,过滤掉与给定实体类型相同的实体。排名列表按重要性降序排列。输入文本被聚合并传递给模型,可选设置 token 限制。输出使用正则表达式解析以提取相关实体。结果存储在批处理元数据中的 'most_relevant_entities' 键下。该算子在出错时最多重试指定次数的 API 调用。系统提示、输入模板和输出模式可以自定义。
Type 算子类型: aggregator
Tags 标签: cpu, api
🔧 Parameter Configuration 参数配置¶
name 参数名 |
type 类型 |
default 默认值 |
desc 说明 |
---|---|---|---|
|
<class 'str'> |
|
API model name. |
|
<class 'str'> |
|
The given entity. |
|
<class 'str'> |
|
The type of queried relevant entities. |
|
<class 'str'> |
|
The input key in the meta field of the samples. It is "event_description" in default. |
|
<class 'str'> |
|
The output key in the aggregation field of the samples. It is "most_relevant_entities" in default. |
|
typing.Optional[typing.Annotated[int, Gt(gt=0)]] |
|
The max token num of the total tokens of the sub documents. Without limitation if it is None. |
|
typing.Optional[str] |
|
URL endpoint for the API. |
|
typing.Optional[str] |
|
Path to extract content from the API response. Defaults to 'choices.0.message.content'. |
|
typing.Optional[str] |
|
The system prompt template. |
|
typing.Optional[str] |
|
The input template. |
|
typing.Optional[str] |
|
The output pattern. |
|
typing.Annotated[int, Gt(gt=0)] |
|
The number of retry attempts when there is an API call error or output parsing error. |
|
typing.Dict |
|
Parameters for initializing the API model. |
|
typing.Dict |
|
Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95} |
|
|
Extra keyword arguments. |
📊 Effect demonstration 效果演示¶
test_default_aggregator¶
MostRelevantEntitiesAggregator(api_model='qwen2.5-72b-instruct', entity='李莲花', query_entity_type='人物')
📥 input data 输入数据¶
📤 output data 输出数据¶
✨ explanation 解释¶
This example demonstrates the typical use of the operator, where it extracts and ranks entities related to '李莲花' (Li Lianhua) from a set of event descriptions. The operator uses a language model API to identify and rank these entities, such as '笛飞声', '单孤刀', and '风火堂管事'. These entities are then stored in the batch metadata under the key 'most_relevant_entities'. 此示例展示了算子的典型用法,即从一组事件描述中提取并排列与'李莲花'相关的实体。算子使用语言模型API来识别和排列这些实体,例如'笛飞声'、'单孤刀'和'风火堂管事'。这些实体随后被存储在批次元数据中的'most_relevant_entities'键下。
test_max_token_num¶
MostRelevantEntitiesAggregator(api_model='qwen2.5-72b-instruct', entity='李莲花', query_entity_type='人物', max_token_num=40)
📥 input data 输入数据¶
📤 output data 输出数据¶
✨ explanation 解释¶
This example shows an important edge case where the operator is configured with a maximum token limit (40 tokens). This limits the amount of text that can be processed by the language model API. As a result, the ranked list of relevant entities may differ due to the reduced input size. In this case, the output includes '西域天魔', '四顾门成员', and '武林盟主'. 此示例展示了一个重要的边缘情况,其中算子配置了最大令牌限制(40个令牌)。这限制了语言模型API可以处理的文本量。因此,由于输入大小减少,相关实体的排名列表可能会有所不同。在这种情况下,输出包括'西域天魔'、'四顾门成员'和'武林盟主'。