Sparse algebra support with OOP API
PR related to: https://github.com/fortran-lang/stdlib/issues/38 https://github.com/fortran-lang/stdlib/issues/749 https://github.com/fortran-lang/stdlib/pull/189
Here I try to propose the OOP API upfront as a means to centralize data-containment format within stdlib, which I feel would be a (the) big added value. I'm migrating the proposal started here FSPARSE to adapt it to stdlib.
Current proposal design:
- Derived Types:
sparse_type(abstract base type)COO_type,CSR_type,CSC_type,ELL_type,SELLC_type(DTs with no data buffer)COO_?p_type,CSR_?p_type,CSC_?p_type,ELL_?p_type,SELLC_?p_type(DTs with data buffer, where?kind precision including complex values) - matrix-vector product:
call spmv( matrix, x, y [,alpha,beta,op] ) !> y = alpha * op(matrix) * x + beta * y
op: sparse_op_none='N' (default), sparse_op_transpose='T' or sparse_op_hermitian='H'
- Data accessors:
val = matrix%at( i , j ) !> to get the value at (i,j) position, if (i,j) not in the sparse pattern, value = 0, if out-of-bounds val = NaN
call matrix%add( i , j, val ) !> to set the value at (i,j) position, if (i,j) not in the current structure, skip
!> OR add a data block
call matrix%add( dofs_i(:), dofs_j(:), mat(:,:) ) !>
- Conversions :
call coo2ordered(COO [, sort_data]) !> sort and remove duplicates from the COO data structure.
call dense2coo( dense , COO ) !> dense is a plain 2d array
call coo2dense( COO , dense )
call coo2csr( COO , CSR ) !> assumes ordered and unique indexes for COO ... to implement a sorting algorithms before transferring data
call csr2coo( CSR , COO ) !> assumes ordered and unique indexes for CSR
call coo2csc( COO , CSC ) !> assumes ordered and unique indexes for COO
call csc2coo( CSC , COO ) !> assumes ordered and unique indexes for CSC
call csr2ell( CSR , ELL [, num_nz_rows] )
call csr2sellc( CSR , SELLC [, chunk ] ) !> chunk sizes for spmv [4, 8, 16]
call diag( <sparse> , diag ) !> extract the diagonal components of the sparse matrix
Wouldn't it be useful if sparse_t has the interface for mat%to_coo|csr|... etc instead of using a function call. Since this is going OOP, lets keep it like that.
This could be done even shorter with a %to(...) interface as the types are declared before calling the conversion. I've done that before but did not propose it here upfront as I wanted to hear some opinions on what the OOP API in stdlib should look like.
Agreed, %to would be even better, or %convert or clarity? hmm...
Is it a good idea to have matvec product that do not initialize the result vector? I have to say that I have spent a good hour debugging a code and the source of divergence in the solver was the fact that I was not initializing the result vector before passing it to the subroutine. While I understand that this is a great way to perform Axpy operation, I would advise to have separate matvec (with initialization) and Axpy (without) routines.
subroutine matvec_csr_1d_sp(vec_y,matrix,vec_x)
type(CSR_sp), intent(in) :: matrix
real(sp), intent(in) :: vec_x(:)
real(sp), intent(inout) :: vec_y(:)
integer :: i, j
real(sp) :: aux
associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
& nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
if( sym == k_NOSYMMETRY) then
do concurrent(i=1:nrows)
do j = rowptr(i), rowptr(i+1)-1
vec_y(i) = vec_y(i) + data(j) * vec_x(col(j))
end do
end do
else if( sym == k_SYMTRIINF )then
do i = 1 , nrows
aux = 0._sp
do j = rowptr(i), rowptr(i+1)-2
aux = aux + data(j) * vec_x(col(j))
vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
end do
aux = aux + data(j) * vec_x(i)
vec_y(i) = vec_y(i) + aux
end do
else if( sym == k_SYMTRISUP )then
do i = 1 , nrows
aux = vec_x(i) * data(rowptr(i))
do j = rowptr(i)+1, rowptr(i+1)-1
aux = aux + data(j) * vec_x(col(j))
vec_y(col(j)) = vec_y(col(j)) + data(j) * vec_x(i)
end do
vec_y(i) = vec_y(i) + aux
end do
end if
end associate
end subroutine
Hi @ftucciarone, thanks for checking out the current draft and sorry for the inconvenience.
Yes, I have intentionally designed the API to not initialize the vector, in the current version within FSPARSE https://github.com/jalvesz/FSPARSE?tab=readme-ov-file#sparse-matrix-vector-product I updated the README to be clear about it but forgot to update here.
The reason for that choice is because it allows maximum flexibility for preprocessing linear systems needing operations with the same matrix operator that will be used for the solver. So it basically places the "responsability" on the user to place a y=0 just before if one does indeed need a clean vector.
This is of course totally open to debate, I just proposed following my experience reworking solvers. My feeling is that doing that can avoid having to manage different implementations or optionals, when the solution can be to explicitly state that the interface is updating (y = y + M * x) instead of overwriting (y = M * x).
Hi @jalvesz, it was actually a pleasure to look into this draft, I am learning a lot and I look forward to discuss it with you if possible.
I have to say that I have particularly strong feelings about the "non-initialization" problem, mostly twofold. First, if y = y + M*x is the chosen way to go, then it must be written very large, at the very beginning of the documentation, to make sure everyone sees that. Test example should also take care of that, showing that the result is wrong if not initialized before. Second, I have the feeling that doing first y=0_dp and then call matvec(y, A, x) might add an overhead due to the initialization of y for very large problems, while initializing the component y(i) while doing the matvec might be faster, but I have to test because I'm not 100% sure about this.
However, I think that separating matvec and Axpy could be feasible at this stage (I can take care of that with the correct guidance) and potentially doable by preprocessing as well (I'm thinking at a fypp if).
Cheers, Francesco
Edit: example of splitting matvec and matAxpy with fypp
#:include "../include/common.fypp"
#:set RANKS = range(1, 2+1)
#:set KINDS_TYPES = REAL_KINDS_TYPES
#:set OPERATIONS = ['matvec', 'matAxpy'] <--- DEFINE THE NAMES
#! define ranks without parentheses
#:def rksfx2(rank)
#{if rank > 0}#${":," + ":," * (rank - 1)}$#{endif}#
#:enddef
!! matvec_csr
#:for k1, t1 in (KINDS_TYPES)
#:for rank in RANKS
#:for idx, opname in enumerate(OPERATIONS) <--- ITERATE OVER THE NAMES
subroutine ${opname}$_csr_${rank}$d_${k1}$(vec_y,matrix,vec_x)
type(CSR_${k1}$), intent(in) :: matrix
${t1}$, intent(in) :: vec_x${ranksuffix(rank)}$
${t1}$, intent(inout) :: vec_y${ranksuffix(rank)}$
integer :: i, j
#:if rank == 1
${t1}$ :: aux
#:else
${t1}$ :: aux(size(vec_x,dim=1))
#:endif
associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
& nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
if( sym == k_NOSYMMETRY) then
do concurrent(i=1:nrows)
do j = rowptr(i), rowptr(i+1)-1
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = data(j) * vec_x(${rksfx2(rank-1)}$col(j))
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
#:endif
end do
end do
else if( sym == k_SYMTRIINF )then
do i = 1 , nrows
aux = 0._${k1}$
do j = rowptr(i), rowptr(i+1)-2
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
end do
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$i)
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = aux
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
#:endif
end do
else if( sym == k_SYMTRISUP )then
do i = 1 , nrows
aux = vec_x(${rksfx2(rank-1)}$i) * data(rowptr(i))
do j = rowptr(i)+1, rowptr(i+1)-1
aux = aux + data(j) * vec_x(${rksfx2(rank-1)}$col(j))
vec_y(${rksfx2(rank-1)}$col(j)) = vec_y(${rksfx2(rank-1)}$col(j)) + data(j) * vec_x(${rksfx2(rank-1)}$i)
end do
#:if idx==0 <--- IF MATVEC
vec_y(${rksfx2(rank-1)}$i) = aux
#:else <--- IF AXPY
vec_y(${rksfx2(rank-1)}$i) = vec_y(${rksfx2(rank-1)}$i) + aux
#:endif
end do
end if
end associate
end subroutine
#:endfor
#:endfor
#:endfor
@ftucciarone thanks for the idea! I will also bring parts of a discussion I had with @ivan-pi on exactly this topic, where he brought to my attention the following:
The MKL library offers, y := beta y + alpha A x, and then you need to pick either beta = 0, or beta = 1, depending on what you want.
In Apple's Accelerate Framework on the other hand, they offer two functions:
SparseMultiply(A,x,y) // y = A x
SparseMultiplyAdd(A,x,y) // y += A x
In PSBLAS, spmm covers both vectors and matrices (rank-2 dense arrays):
call psb_spmm(alpha, a, x, beta, y, desc_a, info)
call psb_spmm(alpha, a, x, beta, y, desc_a, info, trans, work)
https://psctoolkit.github.io/psblasguide/userhtmlse4.html#x18-550004
Taking all those ideas together, I could imagine using optionals to cover:
call matvec( A , vec_x, vec_y ) !> vec_y = vec_y + A * y
call matvec( A , vec_x, vec_y , overwrite = .true. ) !> vec_y = A * y
call matvec( A , vec_x, vec_y , alpha=value, beta=value ) !> vec_y = beta * vec_y + alpha * A * y
Or the default to be overwrite and an optional for addition, or simply with beta =1 if(present(beta)) would automatically determine that.
Let me know your thoughts on that
Taking all those ideas together, I could imagine using optionals to cover:
call matvec( A , vec_x, vec_y ) !> vec_y = vec_y + A * y call matvec( A , vec_x, vec_y , overwrite = .true. ) !> vec_y = A * y call matvec( A , vec_x, vec_y , alpha=value, beta=value ) !> vec_y = beta * vec_y + alpha * A * y
To follow how BLAS handles things, if beta=0 then overwrite = true. There should be no need to explicitly add overwrite=True, it could be difficult for end users to understand matvec(..., beta=2, overwrite=True), using beta=0 would be simpler, and since it is zero it should be ok.
Done in the last commit. I added as default: beta=0 and alpha=1.
Dear @jalvesz
I think that the very next step would be that of defining sparse triangular matrices and a forward-backward solve procedure. I have sketched a possible implementation for CSR (disclaimer: not compiled and not tested yet)
module stdlib_sparse_solve
use stdlib_kinds, only: int8, int16, int32, int64, sp, dp, xdp, qp
use stdlib_sparse_kinds
implicit none
private
public :: fsolve
public :: bsolve
public :: solve
interface fsolve ! call fsolve(vec_y,matrix,vec_x) => L * Y = X or (U^T) * Y = X
module procedure fsolve_csr_qp
end interface
interface bsolve ! call bsolve(vec_y,matrix,vec_x) => U * Y = X or (L^T) * Y = X
module procedure bsolve_csr_qp
end interface
interface solve
module procedure solve_csr_qp
end interface
contains
subroutine fsolve_csr_sp(vec_y,matrix,vec_x,lower)
type(CSR_sp), intent(in) :: matrix
real(sp), intent(in) :: vec_x(:)
real(sp), intent(inout) :: vec_y(:)
integer :: i, j
real(sp) :: aux
logical :: lower ! Inspiration from scipy, if true is lower
associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
& nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
if( lower ) then
do i = 1, nrows
aux = vec_y(i)
do j = rowptr(i), rowptr(i + 1) - 2
aux = aux - data(j) * vec_y( col(j) )
end do
vec_y(i) = aux
end do
else !( upper )
do i = 1, nrows
vec_y(i) = vec_x(k) - vec_y(k)
do j = rowptr(i), rowptr(i + 1) - 1
vec_y( col(j) ) = vec_y( col(j) ) + data(j) * vec_y( i )
end do
end do
end if
end associate
end subroutine
subroutine bsolve_csr_sp(vec_y,matrix,vec_x,lower)
type(CSR_sp), intent(in) :: matrix
real(sp), intent(in) :: vec_x(:)
real(sp), intent(inout) :: vec_y(:)
integer :: i, j
real(sp) :: aux
logical :: lower ! Inspiration from scipy, if true is lower
associate( data => matrix%data, col => matrix%col, rowptr => matrix%rowptr, &
& nnz => matrix%nnz, nrows => matrix%nrows, ncols => matrix%ncols, sym => matrix%sym )
if( lower ) then
vec_y = 0._sp
do i = nrows, 1, -1
vec_y(i) = ( vec_x(i) - vec_y(i) ) / data( rowptr(i + 1) - 1 )
do j = rowptr(i), rowptr(i + 1) - 2
vec_y( col(j) ) = vec_y( col(j) ) + data(j) * vec_y( i )
end do
end do
else !( upper )
vec_y(nrows) = vec_x(nrows) / data( nnz )
do i = nrows - 1, 1, -1
aux = 0._sp
do j = rowptr(i), rowptr(i + 1) - 1
aux = aux + data(j) * vec_x( rowptr(j) )
end do
vec_y(i) = ( vec_x(i) - aux) / data( rowptr(i) )
end do
end if
end associate
end subroutine
subroutine bsolve_csr_sp(vec_y,matrix_L,vec_x,lower)
subroutine solve(vec_b, vex_x, factor_one, factor_two )
type(CSR_sp), intent(in ) :: factor_one
type(CSR_sp), optional, intent(in ) :: factor_two
real(sp), intent(in ) :: vec_b(:)
real(sp), intent(in ) :: vec_x(:)
real(sp), intent(inout) :: vec_y(:)
if ( present(factor_two) ) then
call fsolve_csr_sp(vec_y,factor_one,vec_b, factor_one%islower)
call bsolve_csr_sp(vec_x,factor_two,vec_y, factor_two%islower)
else
call fsolve_csr_sp(vec_y,factor_one,vec_b, factor_one%islower)
call bsolve_csr_sp(vec_x,factor_one,vec_y, .not.factor_one%islower)
end if
end subroutine
end module stdlib_sparse_solve
Here, I have implemented two different forward substitution ( $L y = x$ and $U^T y = x$ ) and two different backward substitution ( $U y = x$ and $L^T y = x$ ). $L y = x$ and $U y = x$ are useful to apply an LU decomposition. $U^T y = x$ is implemented because I have a routine that computes the Cholesky factor but for some reason (that I ignore) it stores it as $U^T$ instead of $L$. The person that wrote that routine is a well known researcher in the numerical linear algebra field so I have no reason to doubt of his choices. $L^T y = x$ was implemented as an exercise and for symmetry reasons 😅
To allow this flexibility at this stage I assumed that the class CSR_sp has a logical descriptor called islower. This is one possibility, the other being to extend the class CSR_sp to something like CSR_sp_triangular and then define islower. Which way do you think would be better? The first one makes little sense (as a CSR matrix needs not to be a triangular factor) but the second one would actually force the user to define the matrix as a triangular, even though this latter choice would be neater in term of code management as a module with all triangular matrices operations could be defined.
Let me know what you (actually open to everyone) think, I plan to implement this operation for CSR, CSC and COO soon (then I need to study the other) and I wish to have a feedback before. Cheers and thanks for the great work
@ftucciarone thanks for the suggestion! This is indeed a needed feature, though some comments:
-
I would say that the very next step would be to get the current PR improved in terms of code structure and the specs in order for it to be merged such that these kind of proposals can be worked on incrementally on top of it. Any help on that would be honestly very welcomed. If we add more features before polishing and merging the PR, it would become unmanageable and I fear it won't get merged any time soon. ( smaller PRs are easier to review and merge :) and considering the specs are not ready, this one will get too big soon )
-
Regarding the triangular solver, I had added the symmetry definition in the DTs for this kind of usages, the idea being that the symmetry defines the pre-supposed storage scheme. So, if
sym == k_SYMTRIINF(SYMmetric TRIangular INFerior) we imply the matrix represents a lower triangular storage, so on and so forth. Maybe a different notation can be used, let me know. -
It is quite common for preconditioning iterative solvers to factorize the matrix by $A=LDL^T$ (including the diagonal term), meaning that the preconditioning step $LDL^T x = b$ needs a forward $L z = b$, diagonal $y = D^{-1}z$, and backward $L^T x = y$ passes. So I would suggest to keep them as explicit individual steps in order to compose as pleased depending on the factorization scheme.
Does the author here and other interested know about the GraphBlas initiative? https://graphblas.org/
It seems it would be valuable to adhere to the API there for a greater overlap? There could be some inspiration at least.
@perazz I tried to adress all your comments as best as possible in the latest commits:
Added a from_ijv interface subroutine to initialize COO, CSR, ELL and SELLC types. It might not be the most performant implementation but I think it gives a base interface that can be improved.
The COO matrix does have an attribute to indicate whether or not the data is sorted. I agree that for some operations it should be called, the initialization using the from_ijv interface does enforces sorting the matrix. But giving that sorting operations can be expensive, I also think that the user should have the control and responsability of calling it just when extrictly necessary.
The data accessor Type-Bound procedures are now called add (it can take an ijv scalar triplet or a v(:,:) block matrix with i(:), j(:) indices) and at (only scalar tuple ij).
Looks great @jalvesz. If I'm not asking too much, I think that it would be valuable to have a second example program, that instead of starting from a dense matrix, starts from i, j, v data and showcases matrix initialization and the at / add interfaces. Otherwise, LGTM!
Added both examples @perazz, let me know what do you think.
Thanks for the PR! I think it looks in the right direction.
In order for me to start using these routines (for parallel sparse matmul, various conversions, etc.), I would need a version that does not use the stdlib's custom derived type, since I have my own derived type on my code. Is there a way to do that?
The only way I know is to split the API into two parts:
- low level that does not use derived types
- high level, OO style (this PR)
Maybe there is another way, but the above way seems robust. Then I can start using the low level API, and just give it the CSR arrays that I have in my code from my own derived type.
Why not structure this PR in this low level / high level style? It's almost there, but not quite.
Thanks for the PR! I think it looks in the right direction.
In order for me to start using these routines (for parallel sparse matmul, various conversions, etc.), I would need a version that does not use the stdlib's custom derived type, since I have my own derived type on my code. Is there a way to do that?
The only way I know is to split the API into two parts:
- low level that does not use derived types
- high level, OO style (this PR)
Maybe there is another way, but the above way seems robust. Then I can start using the low level API, and just give it the CSR arrays that I have in my code from my own derived type.
Why not structure this PR in this low level / high level style? It's almost there, but not quite.
@certik Thank you for your comment. I agree with you that these features should be splitted into 2 parts (I have also my own sparse library, and I might be interested to call some of the stdlib features) :
- low level procedures (e.g, spmv)
- high level for OO
However, @jalvesz started with the OO approach, and I think it is quite close to be merged as is, pending a few "minor" changes. Restructuring this PR might be a huge task.
Therefore, I advice to review and merge this PR as is, maybe after implementing some suggestions where we think the API level could be already lowered a bit. The merged procedures could then be reviewed in a second stage based on users' feedbacks and with having in mind this 2-level approach. The high-level OO approach would not be modified for the user. Only the low-level approach would be added based on waht is already existing.
I am afraid that if we go for the 2-API level approach in one go, we will have nothing in stdlib at the end
due to too many discussions, lack of time, @jalvesz motivation,.... This topic of sparse matrices is on the table for a long time, with many unsuccessful attempts. I think @jalvesz 's implementation is the closest one to be merged. At this stage, I prefer to have something usable in stdlib, even if it only includes OO-API. Users may start to use it, provide feedbacks, and hopefully contribute to stdlib.
Such an approach has been used for other stdlib features (e.g., sorting, hash maps, loggers) with some success.
I will be happy to hear your feedback.
@certik I understand and agree that a low level API is also desirable. Ideally we should eventually be able to link agains MKL following their interfaces which are quite similar but not exactly to Sparse BLAS syntax: https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-fortran/2024-0/sparse-blas-level-2-and-level-3-routines-002.html
Now, this is a huge work. My feeling is that it is doable by changing the low-level back-ends without having to modify the high-level API (or with minimal breaking changes). I tried to bring this proposal to kickoff the interest and see if that could happen eventually and progressively once the DTs are in shape. Also, the idea of the DTs is to have sparse objects at the same level as a dense array such that higher-level interfaces can then be designed to work on either dense or sparse matrices.