Clustergram - Request to cluster based on a selected aggregated "groupby" column
Currently the Dash clustergram is restricted to clustering based on all row or column values. There are cases where I would like to sort my data based on a chosen metadata category, and then cluster based on the mean value of that metadata category. Right now I am forced to choose to preserve sorting without clustering, or cluster by the raw data values and lose the aesthetic grouping that came from pre-sorting the data. Below I have two pictures of Dash-Bio Clustergrams (with my own post-processing touches) that show the situation I am trying to convey.
Clustering by individual samples instead of category

Sorted by a category but no clustering

The functionality I am requesting is similar to the dendrogram option for Scanpy's heatmap function (see https://scanpy.readthedocs.io/en/stable/generated/scanpy.pl.heatmap.html).
I thought a potential solution would be to
- Groupby the chosen category to get mean values for the data
- Run
dashbio.Clustergramon this to get the dendrogram traces back - Sort the original data to have the order match the dendrogram traces
- And then plug those traces back into
dashbio.Clustergramusing the sorted non-grouped original data.
But I would be running the "clustergram" tool twice, and since the category groups have uneven counts of members, the traces from step 2 would not line up 1-to-1 with the sorted data and the x/y coords would need to be adjusted.
Any thoughts on this enhancement?
I just ran into a dataset that had so many data samples that Scipy ran into a "maximum recursion depth exceeded" error when attempting to cluster the samples, so being able to optionally cluster by an aggregated category would also alleviate this issue.
Hi @adkinsrs.
The reordering of the data is proceeding not in the Clustegram component directly, but in the Dendrogram class from the plotly.figure_factory module. So we don't available to fix the main problem of this issue in the dash-bio project. We can create an issue about the reordering problem in the original Dendrogram component from figure_factory.
Best wishes, Nick.