pylops icon indicating copy to clipboard operation
pylops copied to clipboard

Can FISTA run on matrices of data?

Open eurunuela opened this issue 4 years ago • 35 comments

Hello,

I am not sure this is the right place to ask this. I would love to add pylops to my algorithms, but I mainly work with multivariate formulations, i.e., with the entire matrix of data at once.

Does pylops.optimization.sparsity.FISTA accept an np.ndarray of size 200,10 for instance? I have tried it but cannot make it work.

If I can use FISTA like that, I'll definitely want to contribute to add a new operator.

Thank you!

eurunuela avatar Nov 17 '21 10:11 eurunuela

Hi @eurunuela, currently it will not work as internally we call matvec and rmatvec but it should be an easy change to make it work for multiple right-hand sides at the same time :)

I'll take a look and let you know how it goes soon

mrava87 avatar Nov 19 '21 13:11 mrava87

Thank you @mrava87 !

I thought it would be straightforward too, just making sure that the matrix dimensions make sense for matcvec.

eurunuela avatar Nov 19 '21 13:11 eurunuela

Ok, this was easier than I expected.

I just pushed updated ISTA/FISTA on the master. Try:

nt = 61
nx = 5
dt = 0.004
t = np.arange(nt)*dt
x = np.zeros(nt)
x[10] = -.4
x[int(nt/2)] = 1
x[nt-20] = 0.5
x = np.outer(x, np.ones(nx))

h, th, hcenter = pylops.utils.wavelets.ricker(t[:101], f0=20)

Cop = pylops.signalprocessing.Convolve1D(nt, h=h, offset=hcenter,
                                         dtype='float32')
y = Cop*x
yn = y + np.random.normal(0, 0.1, y.shape)

xista, niteri, costi = \
    pylops.optimization.sparsity.ISTA(Cop, y, niter=400, eps=5e-1,
                                      tol=1e-8, returninfo=True)

xfista, niteri, costi = \
    pylops.optimization.sparsity.FISTA(Cop, y, niter=400, eps=5e-1,
                                       tol=1e-8, returninfo=True)

fig, axs = plt.subplots(1, 4, figsize=(14, 5))
axs[0].imshow(x)
axs[0].axis('tight')
axs[1].imshow(y)
axs[1].axis('tight')
axs[2].imshow(xista)
axs[2].axis('tight')
axs[3].imshow(xfista)
axs[3].axis('tight')

This is a simple example of deconvolution using multiple 1d signals as multiple RHS. Have a look and see if this is what you need for your problem :)

mrava87 avatar Nov 19 '21 14:11 mrava87

Yes, this is perfect. Thank you so much Matteo!

If you think it's useful I will implement a L_1 + L_2,1 mixed-norm proximal operator that allows for sparsity and grouping of coefficients.

eurunuela avatar Nov 19 '21 15:11 eurunuela

For sure, that would be great to have :)

I would however more likely see this fitting in https://github.com/pylops/pyproximal

Whilst it is true we have FISTA directly in PyLops, we decided that more advanced solvers which fit within the arena of Proximal operator would be better off living in their own project. This way if you implement a proximal operator you could then use it with ISTA/FISTA but also ADMM or Primal-dual

Makes sense?

mrava87 avatar Nov 19 '21 15:11 mrava87

Of course, that makes sense.

For the record, I'm referring to equation 3 in Gramfort, A., Strohmeier, D., Haueisen, J., Hamalainen, M., & Kowalski, M. (2011, July). Functional brain imaging with M/EEG using structured sparsity in time-frequency dictionaries. In Biennial International Conference on Information Processing in Medical Imaging (pp. 600-611). Springer, Berlin, Heidelberg.

Edit: thank you for you help @mrava87 !

eurunuela avatar Nov 19 '21 15:11 eurunuela

Ok, I planned to do this for a while but since it seems to fit nicely with your needs I just decided to spend a bit of time and get it done.

I have now modified the AcceleratedProximalGradient (https://pyproximal.readthedocs.io/en/latest/api/generated/pyproximal.optimization.primal.AcceleratedProximalGradient.html#pyproximal.optimization.primal.AcceleratedProximalGradient) to allow for 2 type of accelleration, one of them being the FISTA one. So now, whilst PyLops FISTA is purely for L2 + reg*L1, AcceleratedProximalGradient with acceleration='fista' is much more general as it can handle any pair of functionals, one with gradient and one with proximal.

It also works in multiple RHS mode, see here for an example (top part proves that ISTA/FISTA in PyLops are identical to ProximalGradient and AcceleratedProximalGradient when the latter used for L2 + reg*L1): https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/ISTA_FISTA_PyLops_PyProx.ipynb

Note: so far you would need to use both PyLops and PyProximal in master, I will soon make releases for both :)

mrava87 avatar Nov 20 '21 08:11 mrava87

Thank you Matteo. I see there's a typo in the documentation of the AcceleratedProximalGradient. You mention f(x) twice after the equation.

I will have a look at these next week. Thank you so much, this is all very helpful and it will definitely optimize my computational time.

eurunuela avatar Nov 20 '21 10:11 eurunuela

Thanks, I'll fix that :)

mrava87 avatar Nov 20 '21 10:11 mrava87

Hi @mrava87,

I finally found some time to look into this and I'm finding a dimension mismatch error when creating the L2 operator:

tau = 1 / (np.linalg.norm(hrf_matrix) ** 2)
sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
l21_l1 = L21_plus_L1(sigma=sigma)
l2 = L2(Op=hrf, b=y_noise)
AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
                                                  niter=400, acceleration='fista', show=True)

With hrf.shape = (300, 300) and y_noise.shape = (300, 10).

Here's the entire error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-d032dbb1e363> in <module>
      2 sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
      3 l21_l1 = L21_plus_L1(sigma=sigma)
----> 4 l2 = L2(Op=hrf, b=y_noise)
      5 AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
      6                             niter=400, acceleration='fista', show=True)

~/pyproximal/pyproximal/proximal/L2.py in __init__(self, Op, b, q, sigma, alpha, qgrad, niter, x0, warm)
     96         # create data term
     97         if self.Op is not None:
---> 98             self.OpTb = self.sigma * self.Op.rmatvec(self.b)
     99 
    100     def __call__(self, x):

~/miniconda3/lib/python3.8/site-packages/pylops/LinearOperator.py in rmatvec(self, x)
    159 
    160         if x.shape != (M,) and x.shape != (M, 1):
--> 161             raise ValueError('dimension mismatch')
    162 
    163         y = self._rmatvec(x)

ValueError: dimension mismatch

What am I doing wrong to make FISTA work on a matrix of data in AcceleratedProximalGradient?

Thank you!

eurunuela avatar May 26 '22 17:05 eurunuela

Hi @eurunuela, I am not sure as I cannot see the entire code. I tried to make something simple along the lines of your dimensions:

hrf_matrix = np.random.normal(0,1, (300, 300))
hrf = MatrixMult(hrf_matrix)
y_noise = np.random.normal(0,1, (300, 10))

l1 = L21_plus_L1(np.arange(10))
l2 = L2(Op=hrf, b=y_noise)
epsg = np.arange(10)+1

xpgfista = AcceleratedProximalGradient(l2, l1, tau=tau, x0=np.zeros((300,10)), 
                                       epsg=epsg, niter=400, acceleration='fista', show=False)

This seems to work no problem. Note that work means here that the code runs... as I don't have a real problem to look at I am not sure if things work as expected but at least for the L1 case in here https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/ISTA_FISTA_PyLops_PyProx.ipynb the results are also reasonable.

PS: just to be sure, which version are you using? I am on currently running this on main directly

mrava87 avatar May 27 '22 10:05 mrava87

I'm sorry, I wasn't on the main branch. However, now that I have tried on the main branch, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-12f957443d1d> in <module>
      3 l21_l1 = L21_plus_L1(sigma=sigma)
      4 l2 = L2(Op=hrf, b=y_noise)
----> 5 fista_results = AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
      6                             epsg=np.ones(nv), niter=400, acceleration='fista', show=True)

~/pyproximal/pyproximal/optimization/primal.py in AcceleratedProximalGradient(proxf, proxg, x0, tau, beta, epsg, niter, niterback, acceleration, callback, show)
    300     if show:
    301         tstart = time.time()
--> 302         print('Accelerated Proximal Gradient\n'
    303               '---------------------------------------------------------\n'
    304               'Proximal operator (f): %s\n'

TypeError: only size-1 arrays can be converted to Python scalars

eurunuela avatar May 27 '22 11:05 eurunuela

Strange, seems a printing error. Can you give me a code snipped with fake numbers but same dimensions to what you are tying to do, otherwise it is difficult for me to help

mrava87 avatar May 27 '22 17:05 mrava87

Sure thing, here's all the code I'm using:

from splora.deconvolution.hrf_matrix import hrf_linear

nt = 300
nv = 10
tr = 2

y_ideal = np.zeros((nt, nv))

y_ideal[50:51, :] = 1
y_ideal[200:205, :] = 1
y_ideal[250:252, :] =1

nt = 300
p = [6,16,1,1,6,0,32]
hrf_SPM = hrf_linear(2, p)

filler = np.zeros(nt - hrf_SPM.shape[0], dtype=int)
hrf_SPM = np.append(hrf_SPM, filler)

temp = hrf_SPM

for i in range(nt - 1):
    foo = np.append(np.zeros(i + 1), hrf_SPM[0 : (len(hrf_SPM) - i - 1)])
    temp = np.column_stack((temp, foo))
    
hrf_matrix = temp

hrf = pylops.MatrixMult(hrf_matrix)
#hrf = pylops.signalprocessing.Convolve1D(N=nt, h=hrf_SPM)

y = hrf.dot(y_ideal)

y_noise = y + np.repeat(np.random.normal(0, 0.2, y.shape[0])[:, np.newaxis], nv, axis=1)

plt.plot(y_noise[:, 0])

y = y_noise.copy()

tau = 1 / (np.linalg.norm(hrf_matrix) ** 2)
sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
l21_l1 = L21_plus_L1(sigma=sigma)
l2 = L2(Op=hrf, b=y_noise)
fista_results = AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
                            epsg=np.ones(nv), niter=400, acceleration='fista', show=True)

You can get the hrf_linear function from here.

Thank you Matteo!

eurunuela avatar May 27 '22 17:05 eurunuela

Alright, I see. This is a printing problem as epsg was initially not supposed to be multi-valued. You could have just switched show=False and the solver runs (again, not sure if what it gives is meaningful...). I made some changes now (https://github.com/PyLops/pyproximal/pull/85) so that also the printing works fine :)

Let me know when you get some results if they are meaningful, and perhaps at some point it would be good if you can contribute a tutorial on pyproximal

mrava87 avatar May 28 '22 08:05 mrava87

Thank you so much!

I can confirm it works and yields results that are very similar to my own implementation of FISTA with differences probably arising from faster convergence with one of the two methods.

eurunuela avatar May 28 '22 17:05 eurunuela

Fantastic! As I said, if you can make a simple example to show how you use this for a real life problem that would be very nice 😊

mrava87 avatar May 28 '22 18:05 mrava87

Fantastic! As I said, if you can make a simple example to show how you use this for a real life problem that would be very nice 😊

Will do once I add it to my package, because I will have to do it for myself too.

eurunuela avatar May 28 '22 23:05 eurunuela

PS: just to be sure, which version are you using? I am on currently running this on main directly

Have you cut a release with the changes we have discussed in this issue?

My unit tests fail with an error that makes me think these changes are not part of a release.

/opt/conda/envs/py37_env/lib/python3.7/site-packages/pyproximal/optimization/primal.py:318: in AcceleratedProximalGradient
    x = proxg.prox(y - tau * proxf.grad(y), epsg * tau)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<pyproximal.proximal.L21_plus_L1.L21_plus_L1 object at 0x7f1e74e2a510>, array([[-1.60410489e-05, -1.60410489e-05, -1....9, 0.00085759, 0.00085759, 0.00085759, 0.00085759,
       0.00085759, 0.00085759, 0.00085759, 0.00085759, 0.00085759]))
kwargs = {}

    def wrapper(*args, **kwargs):
>       if args[2] <= 0:
E       ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

/opt/conda/envs/py37_env/lib/python3.7/site-packages/pyproximal/ProxOperator.py:12: ValueError

It could also be that I'm not signaling the correct version of pyproximal and pylops.

eurunuela avatar Jun 02 '22 14:06 eurunuela

Currently the situation is a bit messy as we are doing a major new realease of pylops.

I suggest to use the latest pylops on PyPI and the main of pyproximal. I am planning to make a new release of pyproximal soon as there is quite a lot of new things but I’m not sure I will get around doing it before the end of the month

mrava87 avatar Jun 02 '22 16:06 mrava87

I suggest to use the latest pylops on PyPI and the main of pyproximal. I am planning to make a new release of pyproximal soon as there is quite a lot of new things but I’m not sure I will get around doing it before the end of the month

Sounds good. I will test locally until the new pyproximal release is out.

Thanks!

eurunuela avatar Jun 02 '22 16:06 eurunuela

Quick question @mrava87:

How long would you expect AcceleratedProximalGradient to take on a 2100x45 matrix with a maximum of 400 iterations on an HPC cluster?

Edit: the method should check for convergence and exit the loop when the criterion is met.

eurunuela avatar Jun 07 '22 16:06 eurunuela

Hey, It depends on many things because I assume for the proxf you use L2 and that calls an iterative solver (or a dense solver) so the main cost of the overall problem will be dominated by the cost of this part. What do you use? And how long does it take?

Also, when you say HPC cluster what do you mean, as there is no distribution over machines unless you add it yourself so I guess you just mean a good beefy machine?

For the moments our methods don’t do checks so they always stop only when they reach the max number of iterations, we should include that soon when we convert them to classes like we just did for pylops (so we also give the freedom to the user to choose the stopping criterion)

mrava87 avatar Jun 07 '22 20:06 mrava87

PS: we just released pyproximal 0.4.0 so all things we discussed are now in a stable release too :)

mrava87 avatar Jun 07 '22 20:06 mrava87

Hey, It depends on many things because I assume for the proxf you use L2 and that calls an iterative solver (or a dense solver) so the main cost of the overall problem will be dominated by the cost of this part. What do you use? And how long does it take?

I have reduced the number of maximum iterations to 200 just to make sure the entire pipeline runs correctly on the HPC cluster. Right now, solving the inverse problem for ~300 datasets of size 2100 x 45 takes around 5h. It could be slow because the regularization parameter is not high enough. But I still haven't been able to run the entire pipeline and see the results to assess if the regularization parameter is low or high.

Also, when you say HPC cluster what do you mean, as there is no distribution over machines unless you add it yourself so I guess you just mean a good beefy machine?

So, I'm spitting the computation into various jobs in the HPC cluster, with each job having access to 8GB of RAM memory and 16 CPUs. Each one of the jobs solves the inverse problem on 300 different matrices of size 2100 x 45.

For the moments our methods don’t do checks so they always stop only when they reach the max number of iterations, we should include that soon when we convert them to classes like we just did for pylops (so we also give the freedom to the user to choose the stopping criterion)

Sounds good!

On a related note, it looks like the scipy dependencies of the latest pylops and pyproximal versions are different. This raises an error when having these two versions as a dependency of my package.

pylops 1.18.2 depends on scipy>=1.4.0
pyproximal 0.4.0 depends on scipy>=1.8.0

eurunuela avatar Jun 09 '22 19:06 eurunuela

Alright I see. Then I think it makes sense as you do solve a lot of problems. Given that you use Fista the only heavy computation is the matrix multiplication and its adjoint which is done in numpy, so as long as you set the various env variables of OMP/MKL to use as many threads as possible (I guess you mean 16 threads not CPUs as otherwise you would need some sort of MPI parallelism to use them…) I don’t see any way to make it faster… other than managing to cut down iterations by finding the best regularization parameter.

For the dependencies, there is no conflict as both require a >= so if you want both packages the requirement of pyproximal should dominate… what error you get? Or just a warning?

mrava87 avatar Jun 09 '22 21:06 mrava87

I'm using Slurm, so the CPUs argument refers to a consumable resource offered by a node. It can refer to a socket, a core, or a hardware thread, based on the Slurm configuration.

I will check if numpy is already using the OMP variable.

Re dependencies: you're right, but it looks like it scipy>=1.8.0 is only compatible with Python 3.8 or higher. I will have to drop support for 3.6 and 3.7.

eurunuela avatar Jun 10 '22 00:06 eurunuela

Ok great! I expected you meant processes/threads, then make sure numpy is aware. Other than that, since you seem to use dense matrices you could set densesolver=True in L2 and see if that helps. Another thing I could look into is to provide an option if the operator is dense to create its inverse / Cholesky upfront and store it so that at every iteration the cost of solving the problem is very minimal. What do you think?

Regarding debs, that’s correct. Let me take a look why we had to push the dependency of scipy for pyproximal to 1.8.0

mrava87 avatar Jun 10 '22 06:06 mrava87

Other than that, since you seem to use dense matrices you could set densesolver=True in L2 and see if that helps.

The docs say densesolver should be a string.

Another thing I could look into is to provide an option if the operator is dense to create its inverse / Cholesky upfront and store it so that at every iteration the cost of solving the problem is very minimal. What do you think?

That sounds like a great idea!

Regarding debs, that’s correct. Let me take a look why we had to push the dependency of scipy for pyproximal to 1.8.0

Don't worry about it too much. I'm ok with dropping support for Python 3.6 and 3.7.

eurunuela avatar Jun 10 '22 12:06 eurunuela

Sorry, it is indeed a string to choose numpy/scipy when the operator is explicit.

Ok, let me take a look at how we can implement the factorization option, I get back to you soon :)

mrava87 avatar Jun 10 '22 14:06 mrava87