statsmodels icon indicating copy to clipboard operation
statsmodels copied to clipboard

ENH: Medcouple in O(N Log N) time

Open hmustafamail opened this issue 10 months ago • 2 comments

Problem description

The current Statsmodels implementation of medcouple is in O(N^2) time, leading to excessive runtimes and memory issues

Proposed remedy

  • I would like to see a revised version of Guy Brys's R code included in Statsmodels
  • The implementation is available on my Github (link to repo)
  • He has granted permission for this in correspondence
  • Details follow

Historical context

  • Guy Brys authored an R package for efficient medcouple, c. 2004 (link)
  • Jordi Gutiérrez Hermoso used that as a reference for a Python 2 implementation, c. 2015 (link)
  • There was a conversation about whether to include it in the Python Statsmodels project (link)
  • There were concerns due to the original reference implementation being licensed under GNU-GPL
  • However, as mentioned in that thread, such code may be relicensed with author permission

What I did

  • Reached out to Guy on LinkedIn (link to profile) to ask for permission
  • He granted permission
  • Revised Jordi's code for Python 3
  • Validated my revised code against the (quadratic) statsmodels implementation
    • Used data from Jordi's repo
  • RMSE was 1.03e-4
    • Much smaller than statistic's scale of [-1 to 1]
    • Consistent with implementation-level differences
  • Posted the revised code on Github (link to repo)

Please let me know what else may be needed.

hmustafamail avatar May 25 '25 15:05 hmustafamail

Happy to have this improved version. Do you want to open a PR?

On Sun, 25 May 2025, 16:44 Mustafa I. Hussain, Ph.D., < @.***> wrote:

hmustafamail created an issue (statsmodels/statsmodels#9570) https://github.com/statsmodels/statsmodels/issues/9570 Problem description

The current Statsmodels implementation of medcouple is in O(N^2) time, leading to excessive runtimes and memory issues Proposed remedy

  • I would like to see a revised version of Guy Brys's R code included in Statsmodels
  • The implementation is available on my Github (link to repo https://github.com/hmustafamail/MedcoupleNLogN)
  • He has granted permission for this in correspondence
  • Details follow

Historical context

What I did

Please let me know what else may be needed.

— Reply to this email directly, view it on GitHub https://github.com/statsmodels/statsmodels/issues/9570, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKTSRPSO73GXSKTIH5R4GD3AHQOXAVCNFSM6AAAAAB535AJBCVHI2DSMVQWIX3LMV43ASLTON2WKOZTGA4DSMZYHAYDGMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

bashtage avatar May 26 '25 07:05 bashtage

Great, working on a pull request now. Edit: pull request created here: link to pull request.

hmustafamail avatar May 26 '25 17:05 hmustafamail