data_juicer.ops.mapper.relation_identity_mapper module

class data_juicer.ops.mapper.relation_identity_mapper.RelationIdentityMapper(api_model: str = 'gpt-4o', source_entity: str = None, target_entity: str = None, *, output_key: str = 'role_relation', api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]

Bases: Mapper

identify relation between two entity in the text.

DEFAULT_SYSTEM_PROMPT_TEMPLATE = '给定关于{entity1}和{entity2}的文本信息。判断{entity1}和{entity2}之间的关系。\n要求:\n- 关系用一个或多个词语表示,必要时可以加一个形容词来描述这段关系\n- 输出关系时不要参杂任何标点符号\n- 需要你进行合理的推理才能得出结论\n- 如果两个人物身份是同一个人,输出关系为:另一个身份\n- 输出格式为:\n分析推理:...\n所以{entity2}是{entity1}的:...\n- 注意输出的是{entity2}是{entity1}的什么关系,而不是{entity1}是{entity2}的什么关系'
DEFAULT_INPUT_TEMPLATE = '关于{entity1}和{entity2}的文本信息:\n```\n{text}\n```\n'
DEFAULT_OUTPUT_PATTERN_TEMPLATE = '\n        \\s*分析推理:\\s*(.*?)\\s*\n        \\s*所以{entity2}是{entity1}的:\\s*(.*?)\\Z\n    '
__init__(api_model: str = 'gpt-4o', source_entity: str = None, target_entity: str = None, *, output_key: str = 'role_relation', api_endpoint: str | None = None, response_path: str | None = None, system_prompt_template: str | None = None, input_template: str | None = None, output_pattern_template: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]

Initialization method. :param api_model: API model name. :param source_entity: The source entity of the relation to be

identified.

Parameters:
  • target_entity – The target entity of the relation to be identified.

  • output_key – The output key in the meta field in the samples. It is ‘role_relation’ in default.

  • api_endpoint – URL endpoint for the API.

  • response_path – Path to extract content from the API response. Defaults to ‘choices.0.message.content’.

  • system_prompt_template – System prompt template for the task.

  • input_template – Template for building the model input.

  • output_pattern_template – Regular expression template for parsing model output.

  • try_num – The number of retry attempts when there is an API call error or output parsing error.

  • drop_text – If drop the text in the output.

  • model_params – Parameters for initializing the API model.

  • sampling_params – Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95}

  • kwargs – Extra keyword arguments.

parse_output(raw_output)[source]
process_single(sample, rank=None)[source]

For sample level, sample –> sample

Parameters:

sample – sample to process

Returns:

processed sample