data_juicer.analysis.overall_analysis module

class data_juicer.analysis.overall_analysis.OverallAnalysis(dataset, output_path)[源代码]

基类:object

Apply analysis on the overall stats, including mean, std, quantiles, etc.

__init__(dataset, output_path)[源代码]

Initialization method.

参数:
  • dataset -- the dataset to be analyzed

  • output_path -- path to store the analysis results.

refine_single_column(col)[源代码]
analyze(percentiles=[], num_proc=1, skip_export=False)[源代码]

Apply overall analysis on the whole dataset based on the describe method of pandas.

参数:
  • percentiles -- percentiles to analyze

  • num_proc -- number of processes to analyze the dataset

  • skip_export -- whether export the results to disk

返回:

the overall analysis result.