Speed up ranking of candidates
Nearly all of the runtime (> 90%) now gets spent in ranking move candidates. This is because we currently rank move candidates based on the BIC score we get after performing the move and optimizing the 1-neighborhood of the branches affected by the move. This can be seen in this call graph:

We are only interested in the best-ranking move candidate, as this is the one we will apply.
This focuses all optimization efforts on the candidate ranking. Ideas:
- Filter the candidates to rank by doing a fast pre-ranking before the actual ranking (-> optimize even less branches? use some pseudo-likelihood function?).
- Parallelize the ranking of move candidates.
There is also still this open idea to speed up brlen optimization even more: https://github.com/lutteropp/NetRAX/issues/43#issuecomment-798883538
For pre-filtering arc insertion moves, there still is the ancestral-states-idea from https://github.com/lutteropp/NetRAX/issues/41