Parafac2 als em
Enable missing data imputation via the mask argument as done in CP, as brought up in #543 .
The changes are outlined as follows (all lines refer to _parafac2.py):
- Added new argument
maskto theparafac2function. - Initial slice-wise missing data imputation (lines 480-501)
- Norm tensor is adjusted to include only the non-missing entries in the presence of missing data (lines 510-523)
- Update imputation after a full factor update circle (lines 601-611)
- Adjusted
_parafac2_reconstruction_errorto include themaskargument in casenorm_matricesis not given (lines 260, 297-306) - Adjusted
_BroThesisLineSearchto includemask, required when computing the reconstruction error. - Added the
maskdocstring whenever required.
Hi @cchatzis Great! I have my hands full with a paper deadline right now, but I can take a look when my plate clears a little 🙂
Thanks @cchatzis for opening the PR, this will be great to have!
For the failing tests, you can also try to run the tests locally using pytest to check things work as expected and quick debugging.
Regarding the masking, it would be more efficient to do the computation in a vectorized way, e.g. you can first multiply the who slice by the mask array of 0 and 1 and then take the mean over the resulting array rather than loop over individual elements.
Thanks @MarieRoald, will be great to get your feedback too!
Thanks again everyone for taking the time to look and review this.
In the latest commits, I updated the code to include your feedback. Relevant tests are passing now - I will create a small test for the _update_imputed function in the near future.
Thanks @cchatzis for the updates, this version looks good to me.
Codecov Report
Attention: Patch coverage is 98.75000% with 1 line in your changes missing coverage. Please review.
Please upload report for BASE (
main@29ad039). Learn more about missing BASE report. Report is 44 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| tensorly/decomposition/_parafac2.py | 97.77% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #563 +/- ##
=======================================
Coverage ? 88.07%
=======================================
Files ? 132
Lines ? 7915
Branches ? 0
=======================================
Hits ? 6971
Misses ? 944
Partials ? 0
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Hi everyone and thank you very much for your kind feedback and help!
Maybe I am missing something but I cannot seem to reproduce the issue with the paddle backend because when i try to set it I set it for pytest (i.e. : TENSORLY_BACKEND=paddle pytest ...) I get:
UserWarning: TENSORLY_BACKEND should be one of 'numpy''pytorch''tensorflow''cupy''jax', got paddle. Defaulting to numpy'
I am sure I am missing something trivial here, any suggestion/help would be welcome :)
Hi, are there any updates on this?
@cchatzis thanks for the changes, feel free to mark the changes requested as done when it's ready!
Thank you for looking into this @aarmey! I added your suggestions and also made a small change for the paddle backend. Something in the lines of tl.nanmean is not available to compare the results with - I used the numpy version in the test_impute_results() test. Would that be acceptable?
The test results with black don't seem to make sense, two-line comments are marked as problematic:
--- /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py 2025-05-05 19:11:01.364542+00:00
+++ /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py 2025-05-05 19:11:08.018732+00:00
@@ -1,7 +1,7 @@
-"""
would reformat /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py
-The :mod:`tensorly.tenalg` module contains utilities for Tensor Algebra
+"""
+The :mod:`tensorly.tenalg` module contains utilities for Tensor Algebra
operations such as khatri-rao or kronecker product, n-mode product, etc.
"""
[...]
from .... import backend as tl
import pytest
import numpy as np
--- /home/runner/work/tensorly/tensorly/tensorly/decomposition/__init__.py 2025-05-05 19:11:01.360542+00:00
+++ /home/runner/work/tensorly/tensorly/tensorly/decomposition/__init__.py 2025-05-05 19:11:07.181622+00:00
@@ -1,8 +1,8 @@
"""
The :mod:`tensorly.decomposition` module includes utilities for performing
-tensor decomposition such as CANDECOMP-PARAFAC and Tucker.
+tensor decomposition such as CANDECOMP-PARAFAC and Tucker.
"""
We can ignore those errors, since they don't involve the code here. Also, yes, use of np.nanmean is reasonable within the tests. This all looks to be in order, so I'll merge. Thanks again for implementing this!