graspologic Graph matching code structure discussion

Is your feature request related to a problem? Please describe.

Graph matching code is great as is if just following FAQ/SGM exactly
I'd like to generalize the code to a few very similar but distinct use cases, for example:
- Be able to handle multigraphs
- Bilateral graph matching (use connections between the two subgraphs in addition to the subgraphs themselves)
- Hard grouped graph matching (restrict matching to only permute nodes within a pre-defined labeled group)
- Soft grouped graph matching (initialize GM based on the "groupycenter" defined by some known labels, and/or use a label-defined similarity matrix)
  - This one requires the least modification to the current code but I thought I'd include while we're talking about use cases
- GOAT
- I'm sure we will come up with more in the future...

Describe the solution you'd like

I think the code could be more readable and easier to modify for these special cases if we did the following.

Rewrite the main Franke-Wolfe algorithm loop to operate on a private object (defined below)
Write an class for each type of graph matching as appropriate. These classes should likely stay private, but I think they'd help organize the code. An example of the kinds of methods _BaseFAQSolver would have:
- .initialize()
- .compute_gradient()
- .compute_step_size()
- .check_if_converged()
- .project_to_feasible()

I think there are several advantages to doing things this way. The main one is that for many algorithms, most things will be identical, but there will be slight changes that you want to drop in. As an example, if you want to implement GOAT, all you'd have to do is write a new class that inherits from an old one, and just modify the compute_gradient() function. If you want to implement bilateral, you just need to change .compute_gradient() and compute_step_size(). For soft grouped matching, just change initialize(). Hard grouped matching would probably just be a for loop around one of the existing methods.

Note that these are all internal implementation decisions the user never needs to see/care about

Additional context

Very related to #762, again, to implement that you could mostly modify the compute_gradient and compute_step_size in the base class.

@asaadeldin11 thoughts?

May 23 '21 17:05 bdpedigo

Problems that GM is solving:

Vanilla GM (part of GraphMatch)
Soft seeds (probably part of GraphMatch)
Soft group seeds (probably part of GraphMatch)
divide, merge, and conquer (probably it's own class)
bilateral GM (probably it's own class)
multi-layer GM?

Jun 03 '21 17:06 asaadeldin11

Tracking progress

Main code

[x] new solver class
[x] bilateral
[x] multilayer
[x] seeds
[x] similarity
[x] option to turn off numba/use scipy?
[ ] padding/size swapping
[ ] fix reference frame for output permutations
[ ] type hinting
[ ] multiple inits
[ ] wrapper function?
[ ] fix logging functions
[ ] edge case handling
[ ] tests
[ ] documentation
[ ] tutorials

Utility

[ ] compute match ratio when correct is known
[ ] compute alignment strength metric

Reach / later PR

[ ] groupycenter/soft seeding
[ ] soft matching in general (just don't project to permutation matrices at the end and use GOAT methinks)

May 03 '22 18:05 bdpedigo

Benchmarking

[ ] Timing
- [ ] new vs old
- [ ] sparse vs non-sparse
- [ ] parallel vs non-parallel
[ ] Performance
- [ ] make sure matching on hemisphere data all looks the same accuracy-wise
[ ] Reproducibility

May 17 '22 15:05 bdpedigo