update var by dist normalization
Distances to anchor point are now actually normalized for each slide respectively.
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 65.52%. Comparing base (5cd2640) to head (f7b8b2e).
Additional details and impacted files
@@ Coverage Diff @@
## main #1065 +/- ##
==========================================
- Coverage 65.55% 65.52% -0.03%
==========================================
Files 43 43
Lines 6361 6356 -5
Branches 1063 1063
==========================================
- Hits 4170 4165 -5
Misses 1808 1808
Partials 383 383
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/squidpy/tl/_var_by_distance.py | 65.89% <100.00%> (-1.28%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Can you describe the issue you're solving here better and then also translate this into a test?
Can you describe the issue you're solving here better and then also translate this into a test?
Previous normalization was useless as it just changed the the distance values to be within [0,1] without actually changing the scale. Now the distances from categories with a lower maximum are stretched to be comparable. This doesn't affect the raw distances which are still provided when running this function. I will add a test.
Don't fully understand the PR, the code changes seem equivalent? Good to get rid of the dependency for a fairly simple calculation, but still:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
distances = np.array([0.0, 4.2, 10.0, np.nan, 3.3])
design_matrix = pd.DataFrame({
"old": distances,
"new": distances,
})
# old
anchor_point = "old"
scaler = MinMaxScaler()
scaler.fit(design_matrix[[anchor_point]].values)
design_matrix[anchor_point] = scaler.transform(design_matrix[[anchor_point]].values)
# new
anchor_point = "new"
design_matrix[anchor_point] = design_matrix[anchor_point] / design_matrix[anchor_point].max()
design_matrix