DABEST-python icon indicating copy to clipboard operation
DABEST-python copied to clipboard

Enable jitter in paired slopegraph plots to aid discrete data visualisations.

Open mlotinga opened this issue 1 year ago • 10 comments

For discrete data, the data plots generally overlay points or lines over each other, obscuring information.

A facility to add jitter to the data visualisation, without affecting the effect size calculations, would be a standard way to address this, as is implemented (for example) in seaborn.

Applying jitter to the input data is possible but would distort the effect size calculation, which is highly undesirable.

If you could give some advice on where this could be incorporated into the relevant objects, I could have a go at doing it.

Example of the motivating problem:

image

mlotinga avatar Jul 11 '24 13:07 mlotinga

Hi @mlotinga !

For our unpaired plots our design approach is that we would like each data point to be clear and non-overlapping. For this reason, we use the same approach as the seaborn 'swarmplot' which aims to not overlap data points. This is why it (and swarmplot) do not include jitter as a parameter. These plots do, however, end up overlapping data points once they run out of space and hit the 'gutter'. We are currently working on tuning this gutter length. Users can also adjust the dot size which helps a lot for larger sample sizes.

For paired lines it is more challenging (design wise). Perhaps you could elaborate a little further and/or show a seaborn example for the use of jitter in paired line plots?

JAnns98 avatar Jul 16 '24 07:07 JAnns98

Ok, thanks for responding. Yes, the issue here is related to paired slopegraph plots only — the illustration in the original post shows how it can be rather difficult to discern meaning from this if the data are discrete.

With regard to seaborn, in the API there is the object.Jitter() method that I presume could potentially be used to alter the placing of datapoints in a plot...

https://seaborn.pydata.org/generated/seaborn.objects.Jitter.html

mlotinga avatar Jul 16 '24 10:07 mlotinga

Hi @mlotinga , the paired slopegraph plotting is located at this file.

You will have to install nbdev for a easier development process.

Jacobluke- avatar Jul 18 '24 02:07 Jacobluke-

I think it can easily be achieved with a simple line or two of code (even without seaborn; e.g., using np.random.uniform):

Screenshot 2024-07-18 at 3 23 26 PM

JAnns98 avatar Jul 18 '24 07:07 JAnns98

Thanks @Jacobluke- and @JAnns98 for the information. I now have it working.

I added the following plot_kwargs to _effsize_objects.py:

(lines 1009-1011)

slopegraph_xjitter=0,
slopegraph_yjitter=0,
jitter_seed=9876543210, 

I modified plotter.py with

(line 484)

rng = np.random.default_rng(plot_kwargs["jitter_seed"])

and

(lines 493-494)

x_points = [t + plot_kwargs["slopegraph_xjitter"]*rng.standard_t(df=6, size=None) for t in range(x_start, x_start + grp_count)]
y_points = np.array(observation[yvar].tolist()) + plot_kwargs["slopegraph_yjitter"]*rng.standard_t(df=6, size=len(observation[yvar].tolist()))

and I get output like (using slopegraph_xjitter=0, slopegraph_yjitter=0.07, jitter_seed=303):

image

mlotinga avatar Jul 18 '24 10:07 mlotinga

Would this be a useful feature to add to the package?

My edits can be viewed here: https://github.com/mlotinga/DABEST-python_devMJBL

mlotinga avatar Jul 18 '24 10:07 mlotinga

Thats great, glad you could get it done! We will discuss internally whether it could be useful to include in the next release :)

P.s. I would think only x-axis jitter would be appropriate?

JAnns98 avatar Jul 18 '24 11:07 JAnns98

I guess it depends on the application and data - I think it's good to have the flexibility.

mlotinga avatar Jul 18 '24 11:07 mlotinga

@mlotinga Thanks for this, we will aim to add it into the main package (at least the x-jitter) for the next major release!

JAnns98 avatar Jul 23 '24 01:07 JAnns98

  • That's great. After a bit of experimenting I actually found for my application the best visualisation was achieved using a little jitter on both axes. I guess the hesitation on y-axis jittering expressed above might be a concern about data misrepresentation? For the main feature use case of discrete data, it should be easy for users to select a suitable parameter value for yjitter to ensure discrete groups remain visible without causing data confusion. Allowing the flexibility would provide a better set of options to output the clearest visualisation. Seaborn (e.g., regplot) provides this kind of flexibility, leaving the parameter choice to the user.

mlotinga avatar Jul 26 '24 20:07 mlotinga

As mentioned in the p-value adjustment issue, this feature has now been added (with modification) to master as of the most recent update (v2025.03.27).

JAnns98 avatar Mar 27 '25 09:03 JAnns98