data_juicer.ops.mapper.dialog_intent_detection_mapper module

class data_juicer.ops.mapper.dialog_intent_detection_mapper.DialogIntentDetectionMapper(api_model: str = 'gpt-4o', intent_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_intent_labels', analysis_key: str = 'dialog_intent_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

基类:Mapper

Generates user's intent labels in a dialog by analyzing the history, query, and response.

This operator processes a dialog to identify and label the user's intent. It uses a predefined system prompt and templates to build input prompts for an API call. The API model (e.g., GPT-4) is used to analyze the dialog and generate intent labels and analysis. The results are stored in the meta field under 'dialog_intent_labels' and 'dialog_intent_labels_analysis'. The operator supports customizing the system prompt, templates, and patterns for parsing the API response. If the intent candidates are provided, they are included in the input prompt. The operator retries the API call up to a specified number of times if there are errors.

DEFAULT_SYSTEM_PROMPT = '请判断用户和LLM多轮对话中用户的意图。\n要求:\n- 需要先进行分析,然后列出用户所具有的意图,下面是一个样例,请模仿样例格式输出。\n用户:你好,我最近对人工智能很感兴趣,能给我讲讲什么是机器学习吗?\n意图分析:用户在请求信息,希望了解有关机器学习的基础知识。\n意图类别:信息查找\nLLM:你好!当然可以。机器学习是一种人工智能方法,允许计算机通过数据自动改进和学习。\n用户:听起来很有趣,有没有推荐的入门书籍或资料?\n意图分析:用户在请求建议,希望获取关于机器学习的入门资源。\n意图类别:请求建议\nLLM:有很多不错的入门书籍和资源。一本常被推荐的书是《Python机器学习实践》(Python Machine Learning),它涵盖了基础知识和一些实际案例。此外,您还可以参考Coursera或edX上的在线课程,这些课程提供了系统的学习路径。\n用户:谢谢你的建议!我还想知道,学习机器学习需要什么样的数学基础?\n意图分析:用户在寻求信息,希望了解学习机器学习所需的前提条件,特别是在数学方面。\n意图类别:信息查找\nLLM:学习机器学习通常需要一定的数学基础,特别是线性代数、概率论和统计学。这些数学领域帮助理解算法的工作原理和数据模式分析。如果您对这些主题不太熟悉,建议先从相关基础书籍或在线资源开始学习。\n用户:明白了,我会先补习这些基础知识。再次感谢你的帮助!\n意图分析:用户表达感谢,并表示计划付诸行动来补充所需的基础知识。\n意图类别:其他'
DEFAULT_QUERY_TEMPLATE = '用户:{query}\n'
DEFAULT_RESPONSE_TEMPLATE = 'LLM:{response}\n'
DEFAULT_CANDIDATES_TEMPLATE = '备选意图类别:[{candidate_str}]'
DEFAULT_ANALYSIS_TEMPLATE = '意图分析:{analysis}\n'
DEFAULT_LABELS_TEMPLATE = '意图类别:{labels}\n'
DEFAULT_ANALYSIS_PATTERN = '意图分析:(.*?)\n'
DEFAULT_LABELS_PATTERN = '意图类别:(.*?)($|\n)'
__init__(api_model: str = 'gpt-4o', intent_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_intent_labels', analysis_key: str = 'dialog_intent_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

Initialization method.

参数:
  • api_model -- API model name.

  • intent_candidates -- The output intent candidates. Use the intent labels of the open domain if it is None.

  • max_round -- The max num of round in the dialog to build the prompt.

  • labels_key -- The key name in the meta field to store the output labels. It is 'dialog_intent_labels' in default.

  • analysis_key -- The key name in the meta field to store the corresponding analysis. It is 'dialog_intent_labels_analysis' in default.

  • api_endpoint -- URL endpoint for the API.

  • response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.

  • system_prompt -- System prompt for the task.

  • query_template -- Template for query part to build the input prompt.

  • response_template -- Template for response part to build the input prompt.

  • candidate_template -- Template for intent candidates to build the input prompt.

  • analysis_template -- Template for analysis part to build the input prompt.

  • labels_template -- Template for labels to build the input prompt.

  • analysis_pattern -- Pattern to parse the return intent analysis.

  • labels_pattern -- Pattern to parse the return intent labels.

  • try_num -- The number of retry attempts when there is an API call error or output parsing error.

  • model_params -- Parameters for initializing the API model.

  • sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}

  • kwargs -- Extra keyword arguments.

build_input(history, query)[源代码]
parse_output(response)[源代码]
process_single(sample, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample