squidpy icon indicating copy to clipboard operation
squidpy copied to clipboard

update var by dist normalization

Open LLehner opened this issue 5 months ago • 4 comments

Distances to anchor point are now actually normalized for each slide respectively.

LLehner avatar Nov 20 '25 10:11 LLehner

Codecov Report

:white_check_mark: All modified and coverable lines are covered by tests. :white_check_mark: Project coverage is 65.52%. Comparing base (5cd2640) to head (f7b8b2e).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1065      +/-   ##
==========================================
- Coverage   65.55%   65.52%   -0.03%     
==========================================
  Files          43       43              
  Lines        6361     6356       -5     
  Branches     1063     1063              
==========================================
- Hits         4170     4165       -5     
  Misses       1808     1808              
  Partials      383      383              
Files with missing lines Coverage Δ
src/squidpy/tl/_var_by_distance.py 65.89% <100.00%> (-1.28%) :arrow_down:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 20 '25 10:11 codecov[bot]

Can you describe the issue you're solving here better and then also translate this into a test?

timtreis avatar Nov 22 '25 13:11 timtreis

Can you describe the issue you're solving here better and then also translate this into a test?

Previous normalization was useless as it just changed the the distance values to be within [0,1] without actually changing the scale. Now the distances from categories with a lower maximum are stretched to be comparable. This doesn't affect the raw distances which are still provided when running this function. I will add a test.

LLehner avatar Nov 23 '25 16:11 LLehner

Don't fully understand the PR, the code changes seem equivalent? Good to get rid of the dependency for a fairly simple calculation, but still:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

distances = np.array([0.0, 4.2, 10.0, np.nan, 3.3])
design_matrix = pd.DataFrame({
    "old": distances,
    "new": distances,

})

# old
anchor_point = "old"
scaler = MinMaxScaler()
scaler.fit(design_matrix[[anchor_point]].values)
design_matrix[anchor_point] = scaler.transform(design_matrix[[anchor_point]].values)

# new
anchor_point = "new"
design_matrix[anchor_point] = design_matrix[anchor_point] / design_matrix[anchor_point].max()

design_matrix
image

timtreis avatar Dec 01 '25 11:12 timtreis