python-ternary icon indicating copy to clipboard operation
python-ternary copied to clipboard

Different results with plotly ternary vs python-ternary

Open olgabot opened this issue 5 years ago • 8 comments

Hello, Thank you so much for making this package! I'd like to overlay heatmap + scatter as mentioned in this issue: https://github.com/marcharper/python-ternary/issues/129 and addressed in https://github.com/marcharper/python-ternary/issues/121. However, I'm having trouble using the library.

I'm plotting median values of gene expression of a cell type across three species. When I use plotly, the result makes sense to me, where there are many dots in the middle, indicating many shared genes:

Screen Shot 2020-06-17 at 9 09 57 AM

However, when I use that same data for python-ternary, the result didn't make any sense to me. There's a bunch of points outside of the plot, and it's not clear what's happening to the rest. The code is in "Details" below.

Here is the code:

import ternary

def threewayplot(data, title=None):
    fig, tax = ternary.figure()
    tax.scatter(data.values.tolist())
    tax.set_title(title)
    tax.boundary(linewidth=1.0)
    
    corner_offset = 0.005
    tax.right_corner_label(data.columns[0], offset=corner_offset)
    tax.top_corner_label(data.columns[1], offset=0.12)
    tax.left_corner_label(data.columns[2], offset=corner_offset)
    tax.gridlines(color="blue", multiple=5)

    tax.ticks()

    tax.get_axes().axis('off')
    tax.clear_matplotlib_ticks()
    fig.tight_layout()
    
threewayplot(df_nonzero_for_ternary)

Screen Shot 2020-06-17 at 9 10 07 AM

I thought this was a simple rescaling issue and divided each column by the maximum so there were no values greater than 0, but this didn't replicate the results I saw in Plotly, and don't make sense to me as there are still dots outside of the plot, and the pattern doesn't match what I see in plotly:

Screen Shot 2020-06-17 at 9 10 16 AM

threewayplot(df_nonzero_for_ternary/df_nonzero_for_ternary.max())

Do you know what may be happening?

Here is the data for reference: medians.csv.txt

olgabot avatar Jun 17 '20 16:06 olgabot

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

marcharper avatar Jun 18 '20 03:06 marcharper

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

So is the position being normalized per-row rather than per-column? I was confused that even when normalizing the data such that the maximum of each column was 1.

I renormalized the data so the rows sum to 1 and yay, it's working now, thank you so much!!

Screen Shot 2020-06-18 at 9 31 20 AM

I guess I assumed that normalization would happen within the program. Or potentially a check on the data to make sure the rows sum to 1. What do you think? I'd be happy to add it.

olgabot avatar Jun 18 '20 16:06 olgabot

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

marcharper avatar Jun 19 '20 04:06 marcharper

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

I think even having this stated in the README.md and in the introduction Jupyter notebook would be helpful. I liked the idea of a ternary notebook, but spent hours scratching my head before I found the "sum to constant" constraint mentioned on Wikipedia.

cmacdonald avatar Jan 09 '21 15:01 cmacdonald

Hi @cmacdonald, do you want to open a PR with a change to the readme where you would have liked a warning / statement? You should be able to do it easily through the github interface.

marcharper avatar Jan 12 '21 03:01 marcharper

Hi @marcharper and @cmacdonald

Thanks for the detailed discussion on the need to have the sum equal to scale on a row basis. I think that it will be beneficial to include such information in the main github page and documentation. As well to provide a nice example on how to perform the normalization.

In my case, I have a data set with following characteristics: "A" --> magnitude in the thousands "B" --> magnitude in the tens "C" --> magnitude between [1,2]

So, I proceed in two steps: i) min max scaling normalization on each column, and ii) row normalization (as done by @olgabot) to produce a ternary plot

ivan-marroquin avatar Oct 13 '21 16:10 ivan-marroquin

Hi, thanks for the suggestions.

The wikipedia page on ternary plots explains the coordinates have to sum to a constant. There's a link to the wikipedia page on the top of the documentation.

This library just plots. There are many ways to normalize or otherwise transform data and the library doesn't know which methods the user wants. For almost any scenario, there exists example code on Stack Overflow and other sites that explain how to for example normalize a Pandas dataframe by row or column.

marcharper avatar Oct 16 '21 16:10 marcharper

@marcharper

Indeed, there are so many ways to normalize data sets and it is up to users to decide what is the best way. However, I do think that it will be very helpful to let know new users of this nice package that the coordinates must sum to a constant (either 1 or 100, or even something else). Including the link to wikipedia will be an extra benefit!

And again, many thanks for such great package!

ivan-marroquin avatar Oct 16 '21 18:10 ivan-marroquin