data_juicer.analysis.column_wise_analysis module

data_juicer.analysis.column_wise_analysis.get_row_col(total_num, factor=2)[source]

Given the total number of stats figures, get the “best” number of rows and columns. This function is needed when we need to store all stats figures into one image.

Parameters:
  • total_num – Total number of stats figures

  • factor – Number of sub-figure types in each figure. In default, it’s 2, which means there are histogram and box plot for each stat figure

Returns:

“best” number of rows and columns, and the grid list

class data_juicer.analysis.column_wise_analysis.ColumnWiseAnalysis(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]

Bases: object

Apply analysis on each column of stats respectively.

__init__(dataset, output_path, overall_result=None, save_stats_in_one_file=True)[source]

Initialization method

Parameters:
  • dataset – the dataset to be analyzed

  • output_path – path to store the analysis results

  • overall_result – optional precomputed overall stats result

  • save_stats_in_one_file – whether save all analysis figures of all stats into one image file

analyze(show_percentiles=False, show=False, skip_export=False)[source]

Apply analysis and draw the analysis figure for stats.

Parameters:
  • show_percentiles – whether to show the percentile line in each sub-figure. If it’s true, there will be several red lines to indicate the quantiles of the stats distributions

  • show – whether to show in a single window after drawing

  • skip_export – whether save the results into disk

Returns:

draw_hist(ax, data, save_path, percentiles=None, show=False)[source]

Draw the histogram for the data.

Parameters:
  • ax – the axes to draw

  • data – data to draw

  • save_path – the path to save the histogram figure

  • percentiles – the overall analysis result of the data including percentile information

  • show – whether to show in a single window after drawing

Returns:

draw_box(ax, data, save_path, percentiles=None, show=False)[source]

Draw the box plot for the data.

Parameters:
  • ax – the axes to draw

  • data – data to draw

  • save_path – the path to save the box figure

  • percentiles – the overall analysis result of the data including percentile information

  • show – whether to show in a single window after drawing

Returns:

draw_wordcloud(ax, data, save_path, show=False)[source]