nannyml icon indicating copy to clipboard operation
nannyml copied to clipboard

Standard Deviation stat failes on columns with only one value

Open nikml opened this issue 2 years ago • 4 comments

Describe the bug When a column has only one value the standard deviation is 0. This causes summary_stats_std sampling error to fail (division by 0). Ideally we would fail gracefully (e.g. 0 sampling error value?) rather than error out.

nikml avatar Oct 19 '23 17:10 nikml

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 18 '23 18:12 stale[bot]

Tried reproducing but I don't succeed in triggering the division by zero:

def test_stats_std_calculator_should_not_fail_if_chunk_has_single_value():  # noqa: D103
    reference, analysis, _ = load_synthetic_car_loan_dataset()
    reference.loc[:, 'car_value'] = 1000
    try:
        calc = SummaryStatsStdCalculator(
            column_names=['car_value'],
        ).fit(reference)
        _ = calc.calculate(data=analysis)
    except Exception:
        pytest.fail()

Doing something wrong?

nnansters avatar Feb 12 '24 17:02 nnansters

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 12 '24 20:04 stale[bot]

This might be related to the fix we just made for Median in summary stats, to check.

https://github.com/NannyML/nannyml/pull/377

nnansters avatar Apr 15 '24 10:04 nnansters

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jun 14 '24 11:06 stale[bot]