data_juicer.utils.fingerprint_utils module¶ class data_juicer.utils.fingerprint_utils.Hasher[源代码]¶ 基类:object Hasher that accepts python objects as inputs. dispatch: Dict = {}¶ __init__()[源代码]¶ classmethod hash_bytes(value: bytes | List[bytes]) → str[源代码]¶ classmethod hash_default(value: Any) → str[源代码]¶ Use dill to serialize objects to avoid serialization failures. classmethod hash(value: Any) → str[源代码]¶ update(value: Any) → None[源代码]¶ hexdigest() → str[源代码]¶ data_juicer.utils.fingerprint_utils.update_fingerprint(fingerprint, transform, transform_args)[源代码]¶ Combining various objects to update the fingerprint. data_juicer.utils.fingerprint_utils.generate_fingerprint(ds, *args, **kwargs)[源代码]¶ Generate new fingerprints by using various kwargs of the dataset.