trinity.common.models.vllm_patch.api_patch module#
Patch for vllm OpenAI API server.
Mocks the add_signal_handler method to do nothing.
Adds token_ids and prompt_token_ids to the ChatCompletionResponse.
- class trinity.common.models.vllm_patch.api_patch.PatchedChatCompletionResponseChoice(*, index: int, message: ~vllm.entrypoints.openai.protocol.ChatMessage, logprobs: ~vllm.entrypoints.openai.protocol.ChatCompletionLogProbs | None = None, finish_reason: str | None = 'stop', stop_reason: int | str | None = None, token_ids: list[int] = <factory>, **extra_data: ~typing.Any)[source]#
Bases:
ChatCompletionResponseChoice- token_ids: list[int]#
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class trinity.common.models.vllm_patch.api_patch.PatchedChatCompletionResponse(*, id: str = <factory>, object: ~typing.Literal['chat.completion'] = 'chat.completion', created: int = <factory>, model: str, choices: list[~trinity.common.models.vllm_patch.api_patch.PatchedChatCompletionResponseChoice] = list[vllm.entrypoints.openai.protocol.ChatCompletionResponseChoice], service_tier: ~typing.Literal['auto', 'default', 'flex', 'scale', 'priority'] | None = None, system_fingerprint: str | None = None, usage: ~vllm.entrypoints.openai.protocol.UsageInfo, prompt_logprobs: list[dict[int, ~vllm.logprobs.Logprob] | None] | None = None, prompt_token_ids: list[int] = <factory>, kv_transfer_params: dict[str, ~typing.Any] | None = None, **extra_data: ~typing.Any)[source]#
Bases:
ChatCompletionResponse- prompt_token_ids: list[int]#
- choices: list[PatchedChatCompletionResponseChoice]#
- model_config: ClassVar[ConfigDict] = {'extra': 'allow'}#
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async trinity.common.models.vllm_patch.api_patch.chat_completion_full_generator(self, request, result_generator, request_id, model_name, conversation, tokenizer, request_metadata) ErrorResponse | ChatCompletionResponse[source]#