data_juicer.utils.asset_utils module¶
- data_juicer.utils.asset_utils.load_words_asset(words_dir: str, words_type: str)[source]¶
Load words from a asset file named words_type, if not find a valid asset file, then download it from ASSET_LINKS cached by data_juicer team.
- Parameters:
words_dir – directory that stores asset file(s)
words_type – name of target words assets
- Returns:
a dict that stores words assets, whose keys are language names, and the values are lists of words