data_juicer.ops.mapper.dialog_sentiment_detection_mapper module¶

class data_juicer.ops.mapper.dialog_sentiment_detection_mapper.DialogSentimentDetectionMapper(api_model: str = 'gpt-4o', sentiment_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_sentiment_labels', analysis_key: str = 'dialog_sentiment_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶

Bases: Mapper

Generates sentiment labels and analysis for user queries in a dialog.

This operator processes a dialog to detect and label the sentiments expressed by the user. It uses the provided history, query, and response keys to construct prompts for an API call. The API returns sentiment analysis and labels, which are then parsed and stored in the sample’s metadata under the ‘dialog_sentiment_labels’ and ‘dialog_sentiment_labels_analysis’ keys. The operator supports custom templates and patterns for prompt construction and output parsing. If no sentiment candidates are provided, it uses open-domain sentiment labels. The operator retries the API call up to a specified number of times in case of errors.

DEFAULT_SYSTEM_PROMPT = '请判断用户和LLM多轮对话中用户所具有的情绪。\n要求：\n- 需要先进行分析，然后罗列用户所具有的情绪，下面是一个样例，请模仿样例格式输出。\n用户：最近工作压力好大，我觉得整个人都快被压垮了。\n情感分析：用户的言语中透露出明显的压力和疲惫感，可能还夹杂着一些无助和焦虑。\n情感类别：压力、疲惫、无助、焦虑\nLLM：听起来你真的承受了很多，面临这种情况确实不容易。有没有考虑过找一些放松的方式，比如听音乐或者散步来减轻压力呢？\n用户：试过了，但是好像没什么效果，每天的事情都堆积如山。\n情感分析：用户感到无力解决现状，有挫败感，并且对尝试放松的方式失去信心。\n情感类别：无力、挫败\nLLM：我理解你的感受，有时候压力积累到一定程度确实让人难以承受。或许你可以尝试规划一下时间，把任务分成小块来完成，这样可能会减少一些压力感。\n用户：这个主意不错，我会试着让自己更有条理一些，谢谢你的建议。\n情感分析：用户对建议表现出认同和感激，同时展现出试图积极面对问题的态度。\n情感类别：认同、感激、积极\nLLM：不用谢，我很高兴能帮到你。记得给自己一些时间去适应新的计划，有任何需要随时可以跟我说哦！\n'¶

DEFAULT_QUERY_TEMPLATE = '用户：{query}\n'¶

DEFAULT_RESPONSE_TEMPLATE = 'LLM：{response}\n'¶

DEFAULT_CANDIDATES_TEMPLATE = '备选情感类别：[{candidate_str}]'¶

DEFAULT_ANALYSIS_TEMPLATE = '情感分析：{analysis}\n'¶

DEFAULT_LABELS_TEMPLATE = '情感类别：{labels}\n'¶

DEFAULT_ANALYSIS_PATTERN = '情感分析：(.*?)\n'¶

DEFAULT_LABELS_PATTERN = '情感类别：(.*?)($|\n)'¶

__init__(api_model: str = 'gpt-4o', sentiment_candidates: List[str] | None = None, max_round: Annotated[int, Ge(ge=0)] = 10, *, labels_key: str = 'dialog_sentiment_labels', analysis_key: str = 'dialog_sentiment_labels_analysis', api_endpoint: str | None = None, response_path: str | None = None, system_prompt: str | None = None, query_template: str | None = None, response_template: str | None = None, candidate_template: str | None = None, analysis_template: str | None = None, labels_template: str | None = None, analysis_pattern: str | None = None, labels_pattern: str | None = None, try_num: Annotated[int, Gt(gt=0)] = 3, model_params: Dict = {}, sampling_params: Dict = {}, **kwargs)[source]¶

Initialization method.

Parameters:

api_model – API model name.
sentiment_candidates – The output sentiment candidates. Use open-domain sentiment labels if it is None.
max_round – The max num of round in the dialog to build the prompt.
labels_key – The key name in the meta field to store the output labels. It is ‘dialog_sentiment_labels’ in default.
analysis_key – The key name in the meta field to store the corresponding analysis. It is ‘dialog_sentiment_labels_analysis’ in default.
api_endpoint – URL endpoint for the API.
response_path – Path to extract content from the API response. Defaults to ‘choices.0.message.content’.
system_prompt – System prompt for the task.
query_template – Template for query part to build the input prompt.
response_template – Template for response part to build the input prompt.
candidate_template – Template for sentiment candidates to build the input prompt.
analysis_template – Template for analysis part to build the input prompt.
labels_template – Template for labels part to build the input prompt.
analysis_pattern – Pattern to parse the return sentiment analysis.
labels_pattern – Pattern to parse the return sentiment labels.
try_num – The number of retry attempts when there is an API call error or output parsing error.
model_params – Parameters for initializing the API model.
sampling_params – Extra parameters passed to the API call. e.g {‘temperature’: 0.9, ‘top_p’: 0.95}
kwargs – Extra keyword arguments.

build_input(history, query)[source]¶

parse_output(response)[source]¶

process_single(sample, rank=None)[source]¶

For sample level, sample –> sample

Parameters:: sample – sample to process
Returns:: processed sample