data_juicer.ops.mapper.relation_identity_mapper module¶
- class data_juicer.ops.mapper.relation_identity_mapper.RelationIdentityMapper(api_model: str = 'gpt-4o', source_entity: str = None, target_entity: str = None, *, output_key: str = 'role_relation', api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]¶
基类:
Mapper
identify relation between two entity in the text.
- DEFAULT_SYSTEM_PROMPT_TEMPLATE = '给定关于{entity1}和{entity2}的文本信息。判断{entity1}和{entity2}之间的关系。\n要求:\n- 关系用一个或多个词语表示,必要时可以加一个形容词来描述这段关系\n- 输出关系时不要参杂任何标点符号\n- 需要你进行合理的推理才能得出结论\n- 如果两个人物身份是同一个人,输出关系为:另一个身份\n- 输出格式为:\n分析推理:...\n所以{entity2}是{entity1}的:...\n- 注意输出的是{entity2}是{entity1}的什么关系,而不是{entity1}是{entity2}的什么关系'¶
- DEFAULT_INPUT_TEMPLATE = '关于{entity1}和{entity2}的文本信息:\n```\n{text}\n```\n'¶
- DEFAULT_OUTPUT_PATTERN_TEMPLATE = '\n \\s*分析推理:\\s*(.*?)\\s*\n \\s*所以{entity2}是{entity1}的:\\s*(.*?)\\Z\n '¶
- __init__(api_model: str = 'gpt-4o', source_entity: str = None, target_entity: str = None, *, output_key: str = 'role_relation', api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]¶
Initialization method. :param api_model: API model name. :param source_entity: The source entity of the relation to be
identified.
- 参数:
target_entity -- The target entity of the relation to be identified.
output_key -- The output key in the meta field in the samples. It is 'role_relation' in default.
api_endpoint -- URL endpoint for the API.
response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.
system_prompt_template -- System prompt template for the task.
input_template -- Template for building the model input.
output_pattern_template -- Regular expression template for parsing model output.
try_num -- The number of retry attempts when there is an API call error or output parsing error.
drop_text -- If drop the text in the output.
model_params -- Parameters for initializing the API model.
sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}
kwargs -- Extra keyword arguments.