Standard Deviation stat failes on columns with only one value
Describe the bug When a column has only one value the standard deviation is 0. This causes summary_stats_std sampling error to fail (division by 0). Ideally we would fail gracefully (e.g. 0 sampling error value?) rather than error out.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Tried reproducing but I don't succeed in triggering the division by zero:
def test_stats_std_calculator_should_not_fail_if_chunk_has_single_value(): # noqa: D103
reference, analysis, _ = load_synthetic_car_loan_dataset()
reference.loc[:, 'car_value'] = 1000
try:
calc = SummaryStatsStdCalculator(
column_names=['car_value'],
).fit(reference)
_ = calc.calculate(data=analysis)
except Exception:
pytest.fail()
Doing something wrong?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This might be related to the fix we just made for Median in summary stats, to check.
https://github.com/NannyML/nannyml/pull/377
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.