data_juicer.utils.mm_utils module¶

class data_juicer.utils.mm_utils.SpecialTokens[源代码]¶

基类：object

image = '<__dj__image>'¶

audio = '<__dj__audio>'¶

video = '<__dj__video>'¶

eoc = '<|__dj__eoc|>'¶

data_juicer.utils.mm_utils.AV_STREAM_THREAD_TYPE = 'AUTO'¶

av stream thread type support "SLICE", "FRAME", "AUTO".

"SLICE": Decode more than one part of a single frame at once

"FRAME": Decode more than one frame at once

"AUTO": Using both "FRAME" and "SLICE" AUTO is faster when there are no video latency.

data_juicer.utils.mm_utils.get_special_tokens()[源代码]¶

data_juicer.utils.mm_utils.remove_special_tokens(text)[源代码]¶

data_juicer.utils.mm_utils.remove_non_special_tokens(text)[源代码]¶

data_juicer.utils.mm_utils.load_mm_bytes_from_sample(sample, mm_idx, mm_bytes_key=None, sample_idx=None)[源代码]¶

data_juicer.utils.mm_utils.load_data_with_context(sample, context, loaded_data_keys, load_func, mm_bytes_key=None, sample_idx=None)[源代码]¶

The unified loading function with contexts for multimodal data.

参数:

sample -- can be a single sample or a batch of samples.
context -- whether the context fields is activated.
loaded_data_keys -- the data keys (paths) to load.
load_func -- the function used to load the data.
mm_bytes_key -- the key to store the data bytes if it exists. It's None by default.
sample_idx -- the index of the current sample. Used for batched samples.

data_juicer.utils.mm_utils.load_images(paths)[源代码]¶

data_juicer.utils.mm_utils.load_images_byte(paths)[源代码]¶

data_juicer.utils.mm_utils.load_image(path_or_bytes)[源代码]¶

data_juicer.utils.mm_utils.load_image_byte(path)[源代码]¶

data_juicer.utils.mm_utils.image_path_to_base64(image_path)[源代码]¶

data_juicer.utils.mm_utils.image_byte_to_base64(image_byte)[源代码]¶

data_juicer.utils.mm_utils.pil_to_opencv(pil_image)[源代码]¶

data_juicer.utils.mm_utils.detect_faces(image, detector, **extra_kwargs)[源代码]¶

data_juicer.utils.mm_utils.get_file_size(path)[源代码]¶

data_juicer.utils.mm_utils.iou(box1, box2)[源代码]¶

data_juicer.utils.mm_utils.calculate_resized_dimensions(original_size: Tuple[Annotated[int, Gt(gt=0)], Annotated[int, Gt(gt=0)]], target_size: Annotated[int, Gt(gt=0)] | Tuple[Annotated[int, Gt(gt=0)], Annotated[int, Gt(gt=0)]], max_length: int | None = None, divisible: Annotated[int, Gt(gt=0)] = 1) → Tuple[int, int][源代码]¶

Resize dimensions based on specified constraints.

参数:

original_size -- The original dimensions as (height, width).
target_size -- Desired target size; can be a single integer (short edge) or a tuple (height, width).
max_length -- Maximum allowed length for the longer edge.
divisible -- The number that the dimensions must be divisible by.

返回:

Resized dimensions as (height, width).

data_juicer.utils.mm_utils.load_audios(paths)[源代码]¶

data_juicer.utils.mm_utils.load_audio(path, sampling_rate=None)[源代码]¶

data_juicer.utils.mm_utils.load_videos(paths)[源代码]¶

data_juicer.utils.mm_utils.load_video(path, mode='r')[源代码]¶

Load a video using its path.

参数:

path -- the path to this video.
mode -- the loading mode. It's "r" in default.

返回:

a container object form PyAv library, which contains all streams in this video (video/audio/...) and can be used to decode these streams to frames.

data_juicer.utils.mm_utils.get_video_duration(input_video: str | InputContainer, video_stream_index: int = 0)[源代码]¶

Get the video's duration from the container

参数:

input_video -- the container object form PyAv library, which contains all streams in this video (video/audio/...) and can be used to decode these streams to frames.
video_stream_index -- the video stream index to decode, default set to 0.

返回:

duration of the video in second

data_juicer.utils.mm_utils.get_decoded_frames_from_video(input_video: str | InputContainer, video_stream_index: int = 0)[源代码]¶

Get the video's frames from the container

参数:

input_video -- the container object form PyAv library, which contains all streams in this video (video/audio/...) and can be used to decode these streams to frames.
video_stream_index -- the video stream index to decode, default set to 0.

返回:

an iterator of all the frames of the video

data_juicer.utils.mm_utils.cut_video_by_seconds(input_video: str | InputContainer, output_video: str, start_seconds: float, end_seconds: float | None = None)[源代码]¶

Cut a video into several segments by times in second.

参数:

input_video -- the path to input video or the video container.
output_video -- the path to output video.
start_seconds -- the start time in second.
end_seconds -- the end time in second. If it's None, this function will cut the video from the start_seconds to the end of the video.

返回:

a boolean flag indicating whether the video was successfully cut or not.

data_juicer.utils.mm_utils.process_each_frame(input_video: str | InputContainer, output_video: str, frame_func)[源代码]¶

Process each frame in video by replacing each frame by frame_func(frame).

参数:

input_video -- the path to input video or the video container.
output_video -- the path to output video.
frame_func -- a function which inputs a frame and outputs another frame.

data_juicer.utils.mm_utils.extract_key_frames_by_seconds(input_video: str | InputContainer, duration: float = 1)[源代码]¶: Extract key frames by seconds. :param input_video: input video path or av.container.InputContainer. :param duration: duration of each video split in seconds.

data_juicer.utils.mm_utils.extract_key_frames(input_video: str | InputContainer)[源代码]¶

Extract key frames from the input video. If there is no keyframes in the video, return the first frame.

参数:: input_video -- input video path or container.
返回:: a list of key frames.

data_juicer.utils.mm_utils.get_key_frame_seconds(input_video: str | InputContainer)[源代码]¶: Get seconds of key frames in the input video.

data_juicer.utils.mm_utils.extract_video_frames_uniformly_by_seconds(input_video: str | InputContainer, frame_num: Annotated[int, Gt(gt=0)], duration: float = 1)[源代码]¶

Extract video frames uniformly by seconds. :param input_video: input video path or av.container.InputContainer. :param frame_num: the number of frames to be extracted uniformly from

each video split by duration.

参数:: duration -- duration of each video split in seconds.

data_juicer.utils.mm_utils.extract_video_frames_uniformly(input_video: str | InputContainer, frame_num: Annotated[int, Gt(gt=0)])[源代码]¶

Extract a number of video frames uniformly within the video duration.

参数:

input_video -- input video path or container.
frame_num -- The number of frames to be extracted. If it's 1, only the middle frame will be extracted. If it's 2, only the first and the last frames will be extracted. If it's larger than 2, in addition to the first and the last frames, other frames will be extracted uniformly within the video duration.

返回:

a list of extracted frames.

Extract audio data for the given video.

参数:

input_video -- input video. Can be a video path or an av.container.InputContainer.
output_audio -- output audio path. If it's None, the audio data won't be written to file. If stream_indexes is not None, it will output multiple audio files with original filename and the stream indexes. Default: None.
start_seconds -- the start seconds to extract audio data. Default: 0, which means extract from the start of the video.
end_seconds -- the end seconds to stop extracting audio data. If it's None, the extraction won't stop until the end of the video. Default: None.
stream_indexes -- there might be multiple audio streams in the video, so we need to decide which audio streams with stream_indexes will be extracted. It can be a single index or a list of indexes. If it's None, all audio streams will be extracted. Default: None.

data_juicer.utils.mm_utils.size_to_bytes(size)[源代码]¶

data_juicer.utils.mm_utils.insert_texts_after_placeholders(original_string, placeholders, new_texts, delimiter_in_insert_pos=' ')[源代码]¶

data_juicer.utils.mm_utils.timecode_string_to_seconds(timecode: str)[源代码]¶

Convert a timecode string to the float seconds.

参数:: timecode -- the input timecode string. Must in "HH:MM:SS.fff(fff)" format.

data_juicer.utils.mm_utils.parse_string_to_roi(roi_string, roi_type='pixel')[源代码]¶

Convert a roi string to four number x1, y1, x2, y2 stand for the region. When the type is 'pixel', (x1, y1), (x2, y2) are the locations of pixels in the top left corner and the bottom right corner respectively. If the roi_type is 'ratio', the coordinates are normalized by widths and heights.

参数:: roi_string -- the roi string
Patam roi_type:: the roi string type

return tuple of (x1, y1, x2, y2) if roi_string is valid, else None

data_juicer.utils.mm_utils.close_video(container: InputContainer)[源代码]¶

Close the video stream and container to avoid memory leak.

参数:: container -- the video container.