data_juicer.analysis.measure module¶
- class data_juicer.analysis.measure.Measure[source]¶
Bases:
object
Base class for Measure distribution.
- name = 'base'¶
- class data_juicer.analysis.measure.KLDivMeasure[source]¶
Bases:
Measure
Measure Kullback-Leibler divergence.
- name = 'kl_divergence'¶
- class data_juicer.analysis.measure.JSDivMeasure[source]¶
Bases:
Measure
Measure Jensen-Shannon divergence.
- name = 'js_divergence'¶
- class data_juicer.analysis.measure.CrossEntropyMeasure[source]¶
Bases:
Measure
Measure Cross-Entropy.
- name = 'cross_entropy'¶
- class data_juicer.analysis.measure.EntropyMeasure[source]¶
Bases:
Measure
Measure Entropy.
- name = 'entropy'¶
- class data_juicer.analysis.measure.RelatedTTestMeasure[source]¶
Bases:
Measure
Measure T-Test for two related distributions on their histogram of the same bins.
Ref: https://en.wikipedia.org/wiki/Student%27s_t-test
For continuous features or distributions, the input could be dataset stats list. For discrete features or distributions, the input could be the tags or the categories list.
- name = 't-test'¶
- measure(p, q)[source]¶
- Parameters:
p – the first feature or distribution. (stats/tags/categories)
q – the second feature or distribution. (stats/tags/categories)
- Returns:
the T-Test results object – ([ref](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats._result_classes.TtestResult.html#scipy.stats._result_classes.TtestResult)) # noqa: E501