Replicability as a model evaluation technique
Hi! Would it be of interest to have replicability (i.e. are the components discovered replicable?) as another metric to evaluate cp models? The replicability check would boil down to the following steps:
- Split the data in a (user-chosen) mode in $$N$$ folds (also user-chosen)
- Create $$N$$ train (sub)sets from subtracting each fold the complete dataset
- Fit multiple initializations to each train (sub)set and choose the best run according to lowest loss ($$N$$ total best runs)
- Repeat the above process $$M$$ times (user-chosen), to find a total of $$M \times N$$ best runs
- Compare in terms of FMS (skipping the mode the splitting was applied) the factorization to evaluate the replicability of the uncovered patterns
If a certain percentile of the formed set is more than a given threshold, then this model passes this check as it consistently finds the same patterns. What do you think?
References:
[1] Adali T, Kantar F, Akhonda MABS, Strother S, Calhoun VD, Acar E. Reproducibility in Matrix and Tensor Decompositions: Focus on Model Match, Interpretability, and Uniqueness. IEEE Signal Process Mag. 2022 Jul;39(4):8-24. doi: 10.1109/msp.2022.3163870. Epub 2022 Jun 28. PMID: 36337436; PMCID: PMC9635492. [2] Yan S, Li L, Horner D, Ebrahimi P, Chawes B, Dragsted LO, Rasmussen MA, Smilde AK, Acar E. Characterizing human postprandial metabolic response using multiway data analysis. Metabolomics. 2024 May 9;20(3):50. doi: 10.1007/s11306-024-02109-y. PMID: 38722393; PMCID: PMC11082008.
Yes, that would be interesting! Feel free to submit a PR. However, TensorLy-Viz doesn't depend on TensorLy, and I think it would be nice to keep it that way. So the replicability function could maybe accept the decomposition function as an argument?
An alternative solution, of course, is to have a detailed example script for the documentation.