data_juicer.ops.mapper.dialog_intent_detection_mapper module

class data_juicer.ops.mapper.dialog_intent_detection_mapper.DialogIntentDetectionMapper(api_model: str = 'gpt-4o', intent_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_intent_labels', analysis_key: str = 'dialog_intent_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

基类:Mapper

Mapper to generate user's intent labels in dialog. Input from history_key, query_key and response_key. Output lists of labels and analysis for queries in the dialog.

DEFAULT_SYSTEM_PROMPT = '请判断用户和LLM多轮对话中用户的意图。\n要求:\n- 需要先进行分析,然后列出用户所具有的意图,下面是一个样例,请模仿样例格式输出。\n用户:你好,我最近对人工智能很感兴趣,能给我讲讲什么是机器学习吗?\n意图分析:用户在请求信息,希望了解有关机器学习的基础知识。\n意图类别:信息查找\nLLM:你好!当然可以。机器学习是一种人工智能方法,允许计算机通过数据自动改进和学习。\n用户:听起来很有趣,有没有推荐的入门书籍或资料?\n意图分析:用户在请求建议,希望获取关于机器学习的入门资源。\n意图类别:请求建议\nLLM:有很多不错的入门书籍和资源。一本常被推荐的书是《Python机器学习实践》(Python Machine Learning),它涵盖了基础知识和一些实际案例。此外,您还可以参考Coursera或edX上的在线课程,这些课程提供了系统的学习路径。\n用户:谢谢你的建议!我还想知道,学习机器学习需要什么样的数学基础?\n意图分析:用户在寻求信息,希望了解学习机器学习所需的前提条件,特别是在数学方面。\n意图类别:信息查找\nLLM:学习机器学习通常需要一定的数学基础,特别是线性代数、概率论和统计学。这些数学领域帮助理解算法的工作原理和数据模式分析。如果您对这些主题不太熟悉,建议先从相关基础书籍或在线资源开始学习。\n用户:明白了,我会先补习这些基础知识。再次感谢你的帮助!\n意图分析:用户表达感谢,并表示计划付诸行动来补充所需的基础知识。\n意图类别:其他'
DEFAULT_QUERY_TEMPLATE = '用户:{query}\n'
DEFAULT_RESPONSE_TEMPLATE = 'LLM:{response}\n'
DEFAULT_CANDIDATES_TEMPLATE = '备选意图类别:[{candidate_str}]'
DEFAULT_ANALYSIS_TEMPLATE = '意图分析:{analysis}\n'
DEFAULT_LABELS_TEMPLATE = '意图类别:{labels}\n'
DEFAULT_ANALYSIS_PATTERN = '意图分析:(.*?)\n'
DEFAULT_LABELS_PATTERN = '意图类别:(.*?)($|\n)'
__init__(api_model: str = 'gpt-4o', intent_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_intent_labels', analysis_key: str = 'dialog_intent_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

Initialization method.

参数:
  • api_model -- API model name.

  • intent_candidates -- The output intent candidates. Use the intent labels of the open domain if it is None.

  • max_round -- The max num of round in the dialog to build the prompt.

  • labels_key -- The key name in the meta field to store the output labels. It is 'dialog_intent_labels' in default.

  • analysis_key -- The key name in the meta field to store the corresponding analysis. It is 'dialog_intent_labels_analysis' in default.

  • api_endpoint -- URL endpoint for the API.

  • response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.

  • system_prompt -- System prompt for the task.

  • query_template -- Template for query part to build the input prompt.

  • response_template -- Template for response part to build the input prompt.

  • candidate_template -- Template for intent candidates to build the input prompt.

  • analysis_template -- Template for analysis part to build the input prompt.

  • labels_template -- Template for labels to build the input prompt.

  • analysis_pattern -- Pattern to parse the return intent analysis.

  • labels_pattern -- Pattern to parse the return intent labels.

  • try_num -- The number of retry attempts when there is an API call error or output parsing error.

  • model_params -- Parameters for initializing the API model.

  • sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}

  • kwargs -- Extra keyword arguments.

build_input(history, query)[源代码]
parse_output(response)[源代码]
process_single(sample, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample