Changes

`px.ecdf` functionality

I changed three aspects of the ECDF functionality:

The plotted line always spans from 0 to 1 instead of omitting the first step (or last step in case of 'complementary').
Harmonized the plot output such that x=values,y=weights,orientation='v' is the EXACT transpose of y=values,x=weights,orientation='h'. This symmetry was lacking in the former implementation as can be seen in the examples mentioned below.
The redundant 'reversed' option has been removed as it was essentially a copy of 'complementary' with the opposite end omitted.

Example plots of original functionality:

HTML
PDF

original output

Example plots of updated functionality now exhibiting proper symmetry:

HTML
PDF

updated output

Above images were produced with

import numpy as np
import pandas as pd
import plotly.express as px
import panel as pn

data = pd.DataFrame({'first':np.arange(1,7),
                     'second':[20.56,22.17,23.66,25.3,25.9,27.7],
                     'weights':1+np.arange(6),
                     'reversed':6-np.arange(6),
                    })

grid = pn.GridSpec(ncols=4,nrows=2,height=800,width=1600)

for i,mode in enumerate(['standard','complementary']):
    for k,orientation in enumerate(['v','h']):
        for j,x in enumerate([True,False]):
            grid[i,2*j+k] = px.ecdf(data,
                                x=['first','second'] if x else 'weights',
                                y='weights' if x else ['first','second'],
                                orientation=orientation,
                                range_x={'v':[0,30] if x else [0,10],
                                         'h':[0,1]}[orientation],
                                range_y={'v':[0,1],
                                         'h':[0,10] if x else [0,30]}[orientation],
                                ecdfmode=mode,
                                width=400,
                                title=f'{mode} {orientation} {"x" if x else "y"}',
                               )

grid

`px` doc-strings

Corrected some spelling mistakes.
Unified format of first and second lines across parameters.

Code PR

[x] I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
[ ] I have added tests (if submitting a new feature or correcting a bug) or modified existing tests.
[ ] For a new feature, I have added documentation examples in an existing or new tutorial notebook (please see the doc checklist as well).
[ ] I have added a CHANGELOG entry if fixing/changing/adding anything substantial.
[x] For a new feature or a change in behaviour, I have updated the relevant docstrings in the code to describe the feature or behaviour (please see the doc checklist as well).

Oct 01 '21 21:10 eisenlohr

Thanks for this PR! I'll be able to take a look at it next week, before the next release :)

Have a nice weekend!

Oct 01 '21 21:10 nicolaskruchten

One quick note is that I will not be able to merge this as-is given that it removes functionality. We still need to accept reversed as an argument for backwards-compatibility. If you want to propose to make this equivalent to complementary we'd need to implement and document that in the docstring :)

Oct 01 '21 21:10 nicolaskruchten

Hi,

I see, yes, that seems a good way of dealing with it. Do you have some language to that effect? I would suggest:

One of 'standard', 'complementary', or 'reversed'. … 'reversed' is deprecated as it was equivalent to 'complementary'.

Cheers Philip

PS: I will need to fix the code because “black” currently complains (forgot to check that ahead of time!)

On Oct 1, 2021, at 5:16 PM, Nicolas Kruchten @.***> wrote:

One quick note is that I will not be able to merge this as-is given that it removes functionality. We still need to accept reversed as an argument for backwards-compatibility. If you want to propose to make this equivalent to complementary we'd need to implement and document that in the docstring :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/plotly/plotly.py/pull/3407#issuecomment-932580700, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRZ5BCFYJO75KREXXOXYH3UEYQKTANCNFSM5FFQEYZA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Dr. Philip Eisenlohr

Associate Professor Computational Materials Mechanics Department of Chemical Engineering and Materials Science Adjunct Associate Professor Department of Physics and Astronomy Michigan State University

+1 517 432-4506 @.*** https://www.researchgate.net/profile/Philip_Eisenlohr

Oct 01 '21 21:10 eisenlohr

@nicolaskruchten, who would be able to perform the code review?

Nov 10 '21 16:11 eisenlohr

Hello @eisenlohr! I'm sorry for the long delays in responding here, I've gotten very far behind in my work! I will try to review this before the end of the year.

I do have some hesitations about the idea of merging reverse and complementary, as they don't really mean the same thing... from the docstring:

If 'standard', the ECDF is plotted such that values represent data at or below the point. If 'complementary', the CCDF is plotted such that values represent data above the point. If 'reversed', a variant of the CCDF is plotted such that values represent data at or above the point.

The complementary ECDF is 1-ECDF, so the definitions of standard and complementary are correct here. I added reverse mode because, indeed, there was a missing point.

As to the idea of adding an extra point to the series, I am also ambivalent, as this point is not in the underlying data, and will cause duplicated hoverlabels: the 'extra' point will have the same hover_data as the point it was copied from, even though it's not a "real" data point.

Nov 24 '21 13:11 nicolaskruchten

Hi @nicolaskruchten,

Thanks for giving this some thought. My MR addresses a couple of (somewhat independent) aspects.

I would expect the output of h x (or v x) to be a transpose of v y (or h y) when using the same input data. This was not the case in the current implementation (see diagrams above).
The standard definition of cumulative probability is based on 'up to and including'. I think we are in violent agreement that what is called 'standard' and 'complementary' should be (and is) a mirror image. Where I deviate is in seeing a meaningful reason for modifying the classical definition and using 'up to but excluding', i.e. what was called 'reversed'. In my opinion, there is only one way to use and interpret the cumulative probability: if it is P for up to and including x, then it is 1-P for down to and excluding x. Therefore, no need for 'reversed'. Moreover, for the current (your) implementation, there is essentially no difference (except for the missing "closure" at either 1 or 0) between the choice of 'complementary' and 'reversed' in case of the (in my opinion principally correct) v x plot:
Spanning the full range from zero to one appeared to be the usual way for ECDF plots. If one would be really pedantic, the plots should actually extend from -infty at zero cum probabilty to +infty at 1 cum probabilty, i.e. have horizontal (in the standard v x plot) "legs" at the bottom left and top right. Ignoring this detail, the only option that I was able to come up with was to add a copy of the lowest value at a cum prob of zero. In light of what I wrote above, the more appropriate choice might actually be -infty. However, then the plot would look lopsided, since only one of the mentioned "legs" is present... The cleanest look resulted from the choice I proposed in the MR. I agree that the hover information would be a copy, but am not certain whether this would actually be a serious concern?

In summary, I believe that most of the confusion arose due to "1." not being correct, i.e. the current implementation does not yield properly transposed images. Once this is fixed, then it becomes apparent that there is no difference anymore between plots that would result from 'complementary' compared to 'reversed'.

Nov 28 '21 23:11 eisenlohr

Hi @nicolaskruchten ,

Did you get a chance to contemplate my above remarks?

Many greetings, Philip

Mar 07 '22 15:03 eisenlohr

Hi Philip,

I sincerely apologize for letting this languish for so long... I've been a bit distracted by various life events recently (I've just started graduate school!) so I'm behind in Plotly.py work like reviewing PRs. I will definitely try to merge some or all of this PR in an upcoming release :)

Cheers, Nicolas

Mar 07 '22 20:03 nicolaskruchten

Hi @nicolaskruchten, to lend some further support to the argument I made regarding item 3 above, the ECDF is plotted in the same way, i.e. spanning fully between 0 and 1, in Seaborn (see second-to-last example at https://seaborn.pydata.org/generated/seaborn.displot.html#seaborn.displot).

Jun 26 '22 21:06 eisenlohr

Hi @nicolaskruchten, I am trying another ping at this.

The Seaborn image I was referring to in my previous post is this:

Seaborn ECDF example

Sep 30 '22 15:09 eisenlohr

@nicolaskruchten, there is another argument that I would like to stress: the cumulative probability for a single-valued population should be a step function at that value.

This is indeed reflected in the output of Seaborn, but not (yet) in plotly.express:

import pandas as pd
import seaborn as sns
import plotly.express as px

df = pd.DataFrame(dict(x=[1]))
sns.displot(data=df,
            x='x',
            kind='ecdf',
           ).savefig('ECDF_seaborn.png')
px.ecdf(df,
        x='x',
       ).write_image('ECDF_plotlyexpress.png')

Seaborn output

ECDF_seaborn

Ploty.express output

ECDF_plotlyexpress

Oct 24 '22 20:10 eisenlohr

Thank you for your persistence with this issue! I'm still quite embarassed at how long it has been open and how hard it has been for me to find time to properly consider it and I continue to apologize. Your arguments make sense, so my silence is not intended as passive disagreement!

Oct 27 '22 15:10 nicolaskruchten

@nicolaskruchten, no offense taken. I am aware that there is always more to do than hours in a day!

Dec 02 '22 15:12 eisenlohr

harmonized ECDF between "h" and "v", removed "reversed" mode; polishe…

Changes

px.ecdf functionality

px doc-strings

Code PR

Seaborn output

Ploty.express output

`px.ecdf` functionality

`px` doc-strings