ENH: Medcouple in O(N Log N) time
Problem description
The current Statsmodels implementation of medcouple is in O(N^2) time, leading to excessive runtimes and memory issues
Proposed remedy
- I would like to see a revised version of Guy Brys's R code included in Statsmodels
- The implementation is available on my Github (link to repo)
- He has granted permission for this in correspondence
- Details follow
Historical context
- Guy Brys authored an R package for efficient medcouple, c. 2004 (link)
- Jordi Gutiérrez Hermoso used that as a reference for a Python 2 implementation, c. 2015 (link)
- There was a conversation about whether to include it in the Python Statsmodels project (link)
- There were concerns due to the original reference implementation being licensed under GNU-GPL
- However, as mentioned in that thread, such code may be relicensed with author permission
What I did
- Reached out to Guy on LinkedIn (link to profile) to ask for permission
- He granted permission
- link to permission from guy brys.png in my repo
- Revised Jordi's code for Python 3
- Validated my revised code against the (quadratic) statsmodels implementation
- Used data from Jordi's repo
- RMSE was 1.03e-4
- Much smaller than statistic's scale of [-1 to 1]
- Consistent with implementation-level differences
- Posted the revised code on Github (link to repo)
Please let me know what else may be needed.
Happy to have this improved version. Do you want to open a PR?
On Sun, 25 May 2025, 16:44 Mustafa I. Hussain, Ph.D., < @.***> wrote:
hmustafamail created an issue (statsmodels/statsmodels#9570) https://github.com/statsmodels/statsmodels/issues/9570 Problem description
The current Statsmodels implementation of medcouple is in O(N^2) time, leading to excessive runtimes and memory issues Proposed remedy
- I would like to see a revised version of Guy Brys's R code included in Statsmodels
- The implementation is available on my Github (link to repo https://github.com/hmustafamail/MedcoupleNLogN)
- He has granted permission for this in correspondence
- Details follow
Historical context
- Guy Brys authored an R package for efficient medcouple, c. 2004 (link https://search.r-project.org/CRAN/refmans/robustbase/html/mc.html)
- Jordi Gutiérrez Hermoso used that as a reference for a Python 2 implementation, c. 2015 (link https://inversethought.com/hg/)
- There was a conversation about whether to include it in the Python Statsmodels project (link https://groups.google.com/g/pystatsmodels/c/6QWW4tynDW8)
- There were concerns due to the original reference implementation being licensed under GNU-GPL
- However, as mentioned in that thread, such code may be relicensed with author permission
What I did
- Reached out to Guy on LinkedIn (link to profile https://www.linkedin.com/in/guy-brys-412a8a65/) to ask for permission
- He granted permission
- link to permission from guy brys.png in my repo https://github.com/hmustafamail/MedcoupleNLogN/blob/main/permission%20from%20guy%20brys.png
- Revised Jordi's code for Python 3
- Validated my revised code against the (quadratic) statsmodels implementation using Jordi's data
- RMSE was 1.03e-4, much smaller than statistic's scale of [-1 to 1] and consistent with implementation-level error
- Posted the revised code on Github (link to repo https://github.com/hmustafamail/MedcoupleNLogN)
Please let me know what else may be needed.
— Reply to this email directly, view it on GitHub https://github.com/statsmodels/statsmodels/issues/9570, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKTSRPSO73GXSKTIH5R4GD3AHQOXAVCNFSM6AAAAAB535AJBCVHI2DSMVQWIX3LMV43ASLTON2WKOZTGA4DSMZYHAYDGMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Great, working on a pull request now. Edit: pull request created here: link to pull request.