data_juicer.ops.mapper.extract_nickname_mapper module

class data_juicer.ops.mapper.extract_nickname_mapper.ExtractNicknameMapper(api_model: str = 'gpt-4o', *, nickname_key: str = 'nickname', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, output_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

基类:Mapper

Extract nickname relationship in the text.

DEFAULT_SYSTEM_PROMPT = '给定你一段文本,你的任务是将人物之间的称呼方式(昵称)提取出来。\n要求:\n- 需要给出说话人对被称呼人的称呼,不要搞反了。\n- 相同的说话人和被称呼人最多给出一个最常用的称呼。\n- 请不要输出互相没有昵称的称呼方式。\n- 输出格式如下:\n```\n### 称呼方式1\n- **说话人**:...\n- **被称呼人**:...\n- **...对...的昵称**:...\n### 称呼方式2\n- **说话人**:...\n- **被称呼人**:...\n- **...对...的昵称**:...\n### 称呼方式3\n- **说话人**:...\n- **被称呼人**:...\n- **...对...的昵称**:...\n...\n```\n'
DEFAULT_INPUT_TEMPLATE = '# 文本\n```\n{text}\n```\n'
DEFAULT_OUTPUT_PATTERN = '\n        \\#\\#\\#\\s*称呼方式(\\d+)\\s*\n        -\\s*\\*\\*说话人\\*\\*\\s*:\\s*(.*?)\\s*\n        -\\s*\\*\\*被称呼人\\*\\*\\s*:\\s*(.*?)\\s*\n        -\\s*\\*\\*(.*?)对(.*?)的昵称\\*\\*\\s*:\\s*(.*?)(?=\\#\\#\\#|\\Z) # for double check\n    '
__init__(api_model: str = 'gpt-4o', *, nickname_key: str = 'nickname', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, input_template: str | None = None, output_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, drop_text: bool = False, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

Initialization method. :param api_model: API model name. :param nickname_key: The key name to store the nickname

relationship in the meta field. It's "nickname" in default.

参数:
  • api_endpoint -- URL endpoint for the API.

  • response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.

  • system_prompt -- System prompt for the task.

  • input_template -- Template for building the model input.

  • output_pattern -- Regular expression for parsing model output.

  • try_num -- The number of retry attempts when there is an API call error or output parsing error.

  • drop_text -- If drop the text in the output.

  • model_params -- Parameters for initializing the API model.

  • sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}

  • kwargs -- Extra keyword arguments.

parse_output(raw_output)[源代码]
process_single(sample, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample