data_juicer.utils.asset_utils module

data_juicer.utils.asset_utils.load_words_asset(words_dir: str, words_type: str)[source]

Load words from a asset file named words_type, if not find a valid asset file, then download it from ASSET_LINKS cached by data_juicer team.

Parameters:
  • words_dir – directory that stores asset file(s)

  • words_type – name of target words assets

Returns:

a dict that stores words assets, whose keys are language names, and the values are lists of words