Be able to use minkowski metric with p < 1
Describe the bug It seems that PHATE supports minkowski metric for both mds and knn computations. So, I would like to use this metric with p= 0.3 for running experiments. The code does not recognize the use of 'p= 0.3' when calling phate.PHATE
Thanks for your help,
Ivan
To Reproduce embedding= phate.PHATE(n_components= intrinsic_dim, knn= 5, decay= None, n_landmark= 2000, t= 'auto', gamma= 1.0, n_pca= input_data.shape[1], mds_solver= 'smacof', knn_dist= 'minkowski', mds_dist= 'minkowski', mds= 'classic', random_state= 1969, n_jobs= cpu_count, verbose= False, p= 0.3)
Expected behavior The initialization of phate object should take 'p= 0.3' as part of the parameters to initialize phate object
Actual behavior
Traceback (most recent call last):
File "test_phenograph_clustering.py", line 94, in
System information:
Output of phate.__version__:
Please run phate.__version__ and paste the results here.
You can do this with `python -c 'import phate; print(phate.__version__)'`
phate-1.0.7
Output of pd.show_versions():
Please run pd.show_versions() and paste the results here.
You can do this with `python -c 'import pandas as pd; pd.show_versions()'`
INSTALLED VERSIONS
------------------
commit : None
python : 3.6.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 63 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.0
numpy : 1.19.5
pytz : 2018.5
dateutil : 2.7.3
pip : 9.0.3
setuptools : 41.0.1
Cython : 0.29.14
pytest : 6.0.1
hypothesis : None
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 0.9999999
pymysql : None
psycopg2 : None
jinja2 : 2.11.0
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.2.2
numexpr : 2.7.3
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.5.4
sqlalchemy : None
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
Additional context Python 3.6.5 with Deprecated-1.2.12 graphtools-1.5.2 phate-1.0.7 pygsp-0.5.1 s-gd2-1.8 scprep-1.1.0 tasklogger-1.1.0
Hi @ivan-marroquin ,
PHATE doesn't currently offer this functionality, though the new maintainers may choose to add it. In the meantime, you should be able to achieve you desired outcome with a custom metric function as follows:
import phate
from functools import partial
from scipy.spatial.distance import minkowski
dist_fn = partial(minkowski, p=0.3)
phate_op = phate.PHATE(knn_dist=dist_fn, mds_dist=dist_fn)
I'm leaving this issue open as a feature request but please let me know if the proposed alternative doesn't work and we can open up a bug report separately.
Hi @scottgigante ,
Many thanks for the tip! Following your advice, I decided to use numba to define a minkowski metric. Here is the code:
@numba.njit(fastmath= True) def fractional_dist(p_vec, q_vec, fraction= 0.1): result= 0.0
for isamp in range(0, p_vec.shape[0]):
if (p_vec[isamp] > q_vec[isamp]):
result += (p_vec[isamp] - q_vec[isamp]) ** fraction
else:
result += (q_vec[isamp] - p_vec[isamp]) ** fraction
dist= result ** (1 / fraction)
return dist
Then, I will compare the result using your proposed approach.
Ivan