data_juicer.ops.mapper.dialog_sentiment_intensity_mapper module

class data_juicer.ops.mapper.dialog_sentiment_intensity_mapper.DialogSentimentIntensityMapper(api_model: str = 'gpt-4o', max_round: Annotated[int, Ge(ge=0)] = 10, *, intensities_key: str = 'dialog_sentiment_intensity', analysis_key: str = 'dialog_sentiment_intensity_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, analysis_template: str | None = None, intensity_template: str | None = None, analysis_pattern: str | None = None, intensity_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

基类:Mapper

Mapper to predict user's sentiment intensity in a dialog, ranging from -5 to 5.

This operator analyzes the sentiment of user queries in a dialog and outputs a list of sentiment intensities and corresponding analyses. The sentiment intensity ranges from -5 (extremely negative) to 5 (extremely positive), with 0 indicating a neutral sentiment. The analysis is based on the provided history, query, and response keys. The default system prompt and templates guide the sentiment analysis process. The results are stored in the meta field under 'dialog_sentiment_intensity' for intensities and 'dialog_sentiment_intensity_analysis' for analyses. The operator uses an API model to generate the sentiment analysis, with configurable retry attempts and sampling parameters.

DEFAULT_SYSTEM_PROMPT = '请判断用户和LLM多轮对话中用户的情绪变化。\n要求:\n- 用户情绪值是-5到5之间到整数,-5表示极度负面,5表示极度正面,-5到5之间数值表示情绪从负面逐渐到正面的变化过程,0代表情呈绪中性。\n- 只输出当轮对话的分析,不要继续构造对话。\n- 需要先进行分析,然后确定用户的情绪值,下面是一个样例,请模仿样例格式输出。\n用户:你好,我对可持续发展的定义有点模糊,帮我解释一下?\n情绪分析:刚开始,还没得到LLM回复,用户情绪呈中性。\n情绪值:0\nLLM:当然可以!可持续发展是指在满足当代人的需求的同时,不损害子孙后代满足其自身需求的能力的发展模式。它包括经济发展、社会发展和环境保护三个主要方面。通过合理利用资源和保护环境,我们可以确保未来的世代也能享有健全的生态系统和经济制度。\n用户:谢谢你的解释!那你能告诉我一些普通人可以采取的可持续生活方式吗?\n情绪分析:对回答感到满意,情绪正面。\n情绪值:1\nLLM:当然可以,普通人可以通过减少一次性产品的使用、选择公共交通或拼车、节约用水、以及支持本地和可持续发展的企业等方式来践行可持续生活。此外,关注垃圾分类和多用电子账单也是不错的选择。\n用户:你提到支持本地企业,这一点我很感兴趣。能详细说说为什么这对可持续发展有促进作用吗?\n情绪分析:觉得回答实用且具体,情绪进一步转好。\n情绪值:2\nLLM:呃,我最近发现了一部新电影,讲述了一个关于外星人和地球土著合作保护环境的故事。虽然它是科幻片,但很有启发性,推荐你去看看。\n用户:什么吗,根本是答非所问。\n情绪分析:LLM没有回应问题而是提到无关内容,导致用户情绪直线下降。\n情绪值:-2\nLLM:抱歉刚才的偏题!支持本地企业有助于减少长途运输产生的碳足迹,使供应链更加环保。此外,本地企业也更有可能采用可持续的生产方式,同时促进社区经济的繁荣。\n用户:还行吧,算你能够掰回来。\n情绪分析:问题得到解答,问题偏题得到纠正,情绪稍有好转。\n情绪值:-1\n'
DEFAULT_QUERY_TEMPLATE = '用户:{query}\n'
DEFAULT_RESPONSE_TEMPLATE = 'LLM:{response}\n'
DEFAULT_ANALYSIS_TEMPLATE = '情绪分析:{analysis}\n'
DEFAULT_INTENSITY_TEMPLATE = '情绪值:{intensity}\n'
DEFAULT_ANALYSIS_PATTERN = '情绪分析:(.*?)\n'
DEFAULT_INTENSITY_PATTERN = '情绪值:(.*?)($|\n)'
__init__(api_model: str = 'gpt-4o', max_round: Annotated[int, Ge(ge=0)] = 10, *, intensities_key: str = 'dialog_sentiment_intensity', analysis_key: str = 'dialog_sentiment_intensity_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, analysis_template: str | None = None, intensity_template: str | None = None, analysis_pattern: str | None = None, intensity_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[源代码]

Initialization method.

参数:
  • api_model -- API model name.

  • max_round -- The max num of round in the dialog to build the prompt.

  • intensities_key -- The key name in the meta field to store the output sentiment intensities. It is 'dialog_sentiment_intensity' in default.

  • analysis_key -- The key name in the meta field to store the corresponding analysis. It is 'dialog_sentiment_intensity_analysis' in default.

  • api_endpoint -- URL endpoint for the API.

  • response_path -- Path to extract content from the API response. Defaults to 'choices.0.message.content'.

  • system_prompt -- System prompt for the task.

  • query_template -- Template for query part to build the input prompt.

  • response_template -- Template for response part to build the input prompt.

  • analysis_template -- Template for analysis part to build the input prompt.

  • intensity_template -- Template for intensity part to build the input prompt.

  • analysis_pattern -- Pattern to parse the return sentiment analysis.

  • intensity_pattern -- Pattern to parse the return sentiment intensity.

  • try_num -- The number of retry attempts when there is an API call error or output parsing error.

  • model_params -- Parameters for initializing the API model.

  • sampling_params -- Extra parameters passed to the API call. e.g {'temperature': 0.9, 'top_p': 0.95}

  • kwargs -- Extra keyword arguments.

build_input(history, query)[源代码]
parse_output(response)[源代码]
process_single(sample, rank=None)[源代码]

For sample level, sample --> sample

参数:

sample -- sample to process

返回:

processed sample