tensorly icon indicating copy to clipboard operation
tensorly copied to clipboard

Parafac2 als em

Open cchatzis opened this issue 1 year ago • 7 comments

Enable missing data imputation via the mask argument as done in CP, as brought up in #543 .

The changes are outlined as follows (all lines refer to _parafac2.py):

  1. Added new argument mask to the parafac2 function.
  2. Initial slice-wise missing data imputation (lines 480-501)
  3. Norm tensor is adjusted to include only the non-missing entries in the presence of missing data (lines 510-523)
  4. Update imputation after a full factor update circle (lines 601-611)
  5. Adjusted _parafac2_reconstruction_error to include the mask argument in case norm_matrices is not given (lines 260, 297-306)
  6. Adjusted _BroThesisLineSearch to include mask, required when computing the reconstruction error.
  7. Added the mask docstring whenever required.

cchatzis avatar Jul 02 '24 21:07 cchatzis

Hi @cchatzis Great! I have my hands full with a paper deadline right now, but I can take a look when my plate clears a little 🙂

MarieRoald avatar Jul 05 '24 13:07 MarieRoald

Thanks @cchatzis for opening the PR, this will be great to have!

For the failing tests, you can also try to run the tests locally using pytest to check things work as expected and quick debugging.

Regarding the masking, it would be more efficient to do the computation in a vectorized way, e.g. you can first multiply the who slice by the mask array of 0 and 1 and then take the mean over the resulting array rather than loop over individual elements.

Thanks @MarieRoald, will be great to get your feedback too!

JeanKossaifi avatar Jul 08 '24 08:07 JeanKossaifi

Thanks again everyone for taking the time to look and review this.

In the latest commits, I updated the code to include your feedback. Relevant tests are passing now - I will create a small test for the _update_imputed function in the near future.

cchatzis avatar Jul 19 '24 12:07 cchatzis

Thanks @cchatzis for the updates, this version looks good to me.

cohenjer avatar Jul 22 '24 09:07 cohenjer

Codecov Report

Attention: Patch coverage is 98.75000% with 1 line in your changes missing coverage. Please review.

Please upload report for BASE (main@29ad039). Learn more about missing BASE report. Report is 44 commits behind head on main.

Files with missing lines Patch % Lines
tensorly/decomposition/_parafac2.py 97.77% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #563   +/-   ##
=======================================
  Coverage        ?   88.07%           
=======================================
  Files           ?      132           
  Lines           ?     7915           
  Branches        ?        0           
=======================================
  Hits            ?     6971           
  Misses          ?      944           
  Partials        ?        0           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Aug 10 '24 21:08 codecov[bot]

Hi everyone and thank you very much for your kind feedback and help!

Maybe I am missing something but I cannot seem to reproduce the issue with the paddle backend because when i try to set it I set it for pytest (i.e. : TENSORLY_BACKEND=paddle pytest ...) I get:

UserWarning: TENSORLY_BACKEND should be one of 'numpy''pytorch''tensorflow''cupy''jax', got paddle. Defaulting to numpy'

I am sure I am missing something trivial here, any suggestion/help would be welcome :)

cchatzis avatar Aug 29 '24 08:08 cchatzis

Hi, are there any updates on this?

cchatzis avatar Oct 03 '24 10:10 cchatzis

@cchatzis thanks for the changes, feel free to mark the changes requested as done when it's ready!

JeanKossaifi avatar Oct 28 '24 03:10 JeanKossaifi

Thank you for looking into this @aarmey! I added your suggestions and also made a small change for the paddle backend. Something in the lines of tl.nanmean is not available to compare the results with - I used the numpy version in the test_impute_results() test. Would that be acceptable?

cchatzis avatar May 05 '25 18:05 cchatzis

The test results with black don't seem to make sense, two-line comments are marked as problematic:

--- /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py	2025-05-05 19:11:01.364542+00:00
+++ /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py	2025-05-05 19:11:08.018732+00:00
@@ -1,7 +1,7 @@
-""" 
would reformat /home/runner/work/tensorly/tensorly/tensorly/tenalg/__init__.py
-The :mod:`tensorly.tenalg` module contains utilities for Tensor Algebra 
+"""
+The :mod:`tensorly.tenalg` module contains utilities for Tensor Algebra
 operations such as khatri-rao or kronecker product, n-mode product, etc.
 """

[...]

  from .... import backend as tl
 
 import pytest
 import numpy as np
--- /home/runner/work/tensorly/tensorly/tensorly/decomposition/__init__.py	2025-05-05 19:11:01.360542+00:00
+++ /home/runner/work/tensorly/tensorly/tensorly/decomposition/__init__.py	2025-05-05 19:11:07.181622+00:00
@@ -1,8 +1,8 @@
 """
 The :mod:`tensorly.decomposition` module includes utilities for performing
-tensor decomposition such as CANDECOMP-PARAFAC and Tucker.                                                                                               
+tensor decomposition such as CANDECOMP-PARAFAC and Tucker.
 """

cchatzis avatar May 05 '25 20:05 cchatzis

We can ignore those errors, since they don't involve the code here. Also, yes, use of np.nanmean is reasonable within the tests. This all looks to be in order, so I'll merge. Thanks again for implementing this!

aarmey avatar May 05 '25 21:05 aarmey