plotters icon indicating copy to clipboard operation
plotters copied to clipboard

[BUG] Quartiles does not calculate Q1/Q3 values correctly

Open jlyonsmith opened this issue 3 years ago • 0 comments

Describe the bug The Quartiles struct calculates the Q1 & Q3 lower/upper median values using a linear regression which is not how quartile ranges should be calculated over a discrete set of values.

To Reproduce The follow program reproduces the issues clearly:

use plotters::prelude::*;

fn main() {
    let quartile = Quartiles::new(&[
        98, 94, 94, 62, 73, 85, 77, 68, 72, 85, 99, 100, 86, 60, 52, 100, 80, // Odd
    ]);

    println!("{:?}", quartile);
   // Gives Quartiles { lower_fence: 39.0, lower: 72.0, median: 85.0, upper: 94.0, upper_fence: 127.0 }

    // According to https://study.com/skill/learn/how-to-find-interquartile-range-explanation.html
    // should be:
    // median = 85
    // lower = 70 (Q1)
    // upper = 96 (Q3)

    let quartile = Quartiles::new(&[9, 3, 2, 5, 6, 11, 4, 3, 2]); // Odd

    println!("{:?}", quartile);
    // Gives Quartiles { lower_fence: -1.5, lower: 3.0, median: 4.0, upper: 6.0, upper_fence: 10.5 }

    // According to https://www.varsitytutors.com/algebra_1-help/how-to-find-interquartile-range
    // should be:
    // median = 4
    // lower = 2.5 (Q1)
    // upper = 7.5 (Q3)

    let quartile = Quartiles::new(&[11, 2, 4, 3, 8, 1, 2, 7, 4, 9]); // Even

    println!("{:?}", quartile);
    // Gives Quartiles { lower_fence: -6.0, lower: 2.25, median: 4.0, upper: 7.75, upper_fence: 16.0 }

    // According to https://www.varsitytutors.com/algebra_1-help/how-to-find-interquartile-range
    // should be:
    // median = 4
    // lower = 2 (Q1)
    // upper = 8 (Q3)
}

I was "rusty" on quartile ranges so I went to several sources to brush up on the mathematics at various reputable sites. When I started generating Boxplot with the library, something did not look right, so I checked the math by hand and looked at the code. Seems like the actual calculation for odd/even sets of values is actually simpler to implement than the linear regression.

Version Information This was tested on version 0.3.4.

jlyonsmith avatar Jan 01 '23 21:01 jlyonsmith