stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

linalg: least squares

Open perazz opened this issue 1 year ago • 4 comments

Least squares solution to to system $Ax=b$, i.e. such that the 2-norm $||b-Ax||$ is minimized., based on LAPACK *GELSD functions.

  • [x] base implementation
  • [x] tests
  • [x] exclude unsupported xdp
  • [x] documentation
  • [x] submodule
  • [x] examples
  • [x] subroutine interface

Prior art:

  • NumPy: lstsq(a, b, rcond='warn')
  • Scipy: lstsq(a, b, cond=None, overwrite_a=False, overwrite_b=False, check_finite=True, lapack_driver=None)
  • IMSL: Result = IMSL_QRSOL(B, [A] [, AUXQR] [, BASIS] [, /DOUBLE] [, QR] [, PIVOT] [, RESIDUAL] [, TOLERANCE])

Proposed implementation: generic interface for either 1 RHS or many RHS's

  • x = lstsq(a,b) -> simplest call
  • x = lstsq(a,b,cond,overwrite_a,rank,err) -> expert call:
    • real(kind(a)), intent(in), optional :: cond = optional cutoff for rank evaluation: singular values s(i)<=cond*maxval(s) are considered 0. Default: 0.0
    • logical(lk), optional, intent(in) :: overwrite_a = option to avoid internal allocation, but destroy input matrix a. Default: .false.
    • integer(ilp), optional, intent(out) :: rank = return rank of A, if requested
    • type(linalg_state_type), optional, intent(out) :: err = return state variable

cc: @jvdp1 @jalvesz @fortran-lang/stdlib I believe this is ready for consideration.

perazz avatar Apr 21 '24 13:04 perazz

Thank you all @jvdp1 @gnikit @jalvesz for the comments! I believe it's polished enough now. Regarding the allocation question @jalvesz, please let me know. Because there are 3/4 allocations per case, the subroutine interface would look similar to gelsd: something like call lstsq_solve(A,b,x,rwork,iwork,cwork,rank,err) so if one wants to handle the solution fully controlling it, maybe (thinking aloud) they could just use the gelsd LAPACK interface that we also provide? Otherwise, I'd say let's open a separate issue for this, so we can discuss further and find a suitable common ground

perazz avatar Apr 26 '24 06:04 perazz

Regarding the allocation question @jalvesz, please let me know. Because there are 3/4 allocations per case, the subroutine interface would look similar to gelsd: something like call lstsq_solve(A,b,x,rwork,iwork,cwork,rank,err) so if one wants to handle the solution fully controlling it, maybe (thinking aloud) they could just use the gelsd LAPACK interface that we also provide?

Indeed, full control on that regard can be done using the gelsd interface directly.

I'd say let's open a separate issue for this, so we can discuss further and find a suitable common ground

Yes, let's keep this idea in mind for latter on.

jalvesz avatar Apr 26 '24 12:04 jalvesz

@jvdp1 @jalvesz @gnikit I added the subroutine interface. This interface is now far more complicated, so I would like to ask your help deciding if the expert user interface has now too many control knobs? We now also have to export an interface to let the user compute what's the minimum working array space for all cases.

Here is the new interfaces I've prepared:

https://github.com/fortran-lang/stdlib/blob/2e370f45438a637c90b608609f6edc0ddbdc6ef6/src/stdlib_linalg.fypp#L344-L351

https://github.com/fortran-lang/stdlib/blob/2e370f45438a637c90b608609f6edc0ddbdc6ef6/src/stdlib_linalg.fypp#L295-L321

perazz avatar Apr 29 '24 08:04 perazz

I'll try to take a close look at this soon @perazz, great work!! This makes me think about the issues when one wants to solve sparse linear systems with different preconditioners, one also needs to create a work space for the preconditioner. So I think it is worth defining this notion properly now.

jalvesz avatar Apr 29 '24 10:04 jalvesz

Thank you @perazz . It sounds ready to be merged IMO. Do you agree? @jalvesz @gnikit ?

jvdp1 avatar May 07 '24 06:05 jvdp1

Yes, I think this PR is ready. The subroutine & DT discussion can be continued down the road in a new PR, which are not show stoppers IMO ;)

jalvesz avatar May 08 '24 06:05 jalvesz

LGTM, however some CI MacOS tests are failing. Do we know what is causing that?

gnikit avatar May 08 '24 09:05 gnikit

Those are fails from before this merged PR https://github.com/fortran-lang/stdlib/pull/807 by @perazz, I think running the tests again would be successful.

jalvesz avatar May 08 '24 09:05 jalvesz

Thank you @perazz /all. I will merge this PR now, and I suggest to wait on #806 to create a new release.

jvdp1 avatar May 08 '24 18:05 jvdp1

Great! Thank you @jvdp1 @jalvesz @gnikit. For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

perazz avatar May 09 '24 06:05 perazz

For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

Which other linear algebra functions do you plan to add? Could they be grouped based on some theme?

jvdp1 avatar May 09 '24 07:05 jvdp1

If we look at NumPy.linalg for example, good LAPACK-based functions would be:

  • eigenvalues (eig, eigh, eigvals, eigvalsh)
  • condition number (cond)
  • matrix decompositions (svd, cholesky, qr)
  • pseudo-inverse (based on svd)

perazz avatar May 09 '24 07:05 perazz

We could follow the same subsections ("Matrix and vector products", "decompositions", "matrix eigenvalues", "norms", "solving equations") as in Numpy. Each section could be related with a new release. However, this would need some orders of implementation and impact your plans, which I don't want. So, as you prefer regarding releases.

jvdp1 avatar May 09 '24 07:05 jvdp1

For the new release, how about waiting until most of the linear algebra functions are in? But, your proposal is also ok to me.

For me the answer depends on how long it will take for all the linear algebra functions to be completed and the final release to be issued. My original thought was to aim for quick feedback loops from the users/community to inform future linalg API PRs. The way I see it, incremental releases for high quality, self-contained work, come at a small cost to development. This is how I would classify these last few PRs.

However, the main objective is to deliver the linear algebra API not to follow release best-practices, so I leave this entirely up to you @perazz.

gnikit avatar May 09 '24 12:05 gnikit

Makes a lot of sense @gnikit. If you'd like to prepare a new release already @jvdp1, I'm in for that, do let me know if/what inputs do you need from my side. For example, could be 0.5.1 so the community can respond to the API on Discourse. Thanks!

perazz avatar May 09 '24 12:05 perazz