data_juicer.analysis.column_wise_analysis module

data_juicer.analysis.column_wise_analysis.get_row_col(total_num, factor=2)[源代码]

Given the total number of stats figures, get the "best" number of rows and columns. This function is needed when we need to store all stats figures into one image.

参数:
  • total_num -- Total number of stats figures

  • factor -- Number of sub-figure types in each figure. In default, it's 2, which means there are histogram and box plot for each stat figure

返回:

"best" number of rows and columns, and the grid list

class data_juicer.analysis.column_wise_analysis.ColumnWiseAnalysis(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[源代码]

基类:object

Apply analysis on each column of stats respectively.

__init__(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[源代码]

Initialization method

参数:
  • dataset -- the dataset to be analyzed

  • output_path -- path to store the analysis results

  • overall_result -- optional precomputed overall stats result

  • save_stats_in_one_file -- whether save all analysis figures of all stats into one image file

analyze(show_percentiles=False, show=False, skip_export=False)[源代码]

Apply analysis and draw the analysis figure for stats.

参数:
  • show_percentiles -- whether to show the percentile line in each sub-figure. If it's true, there will be several red lines to indicate the quantiles of the stats distributions

  • show -- whether to show in a single window after drawing

  • skip_export -- whether save the results into disk

返回:

draw_hist(ax, data, save_path, percentiles=None, show=False)[源代码]

Draw the histogram for the data.

参数:
  • ax -- the axes to draw

  • data -- data to draw

  • save_path -- the path to save the histogram figure

  • percentiles -- the overall analysis result of the data including percentile information

  • show -- whether to show in a single window after drawing

返回:

draw_box(ax, data, save_path, percentiles=None, show=False)[源代码]

Draw the box plot for the data.

参数:
  • ax -- the axes to draw

  • data -- data to draw

  • save_path -- the path to save the box figure

  • percentiles -- the overall analysis result of the data including percentile information

  • show -- whether to show in a single window after drawing

返回:

draw_wordcloud(ax, data, save_path, show=False)[源代码]