Still unresolved and worsening problems
A problem that AI cannot solve at all, which arose in a professional real-life situation.
This is due to the inability to effectively solve loose ladders—a long-standing issue I pointed out a year ago, which still shows no sign of resolution. In fact, the situation has worsened: under the new value function, the policy weighting for loose ladders has effectively collapsed to zero, making them even less likely to be read out properly.
Where did you point it out before, and do you have an SGF for this and the other cases?
If there is very systematic or repeated type of shape that it occurs in, then it's very possible to fix.
Where did you point it out before, and do you have an SGF for this and the other cases?
I pointed it out in #956 and #960—it's the exact same sample.
If there is very systematic or repeated type of shape that it occurs in, then it's very possible to fix.
I assumed this was already understood, but just to clarify — loose ladders, unlike normal ladders, don’t follow systematic patterns and typically require deep reading. Given their extremely low success rate, they tend to be particularly challenging for policy and value networks to handle reliably.
Thanks! Are there any other games you know of with a similar problem, or is it only this game so far?
How do we tell the difference between "all the networks entirely miss the 'loose ladder' in just this game" and "all the networks entirely miss 'loose ladders' commonly all the time"?
At the moment I don’t have another game where the miss is as blatant as in #956 / #960, so we may be looking at a single concrete example so far.
That said, the evidence suggests the nets aren’t truly “understanding” loose ladders, but only stumble upon them when brute-forcing becomes feasible late in the sequence. My reasoning is inductive, based on these observations:
- Current position – the net never identifies the loose ladder.
- After the very first move in the intended sequence – the net can recognise it, but only if I crank up noise/temperature and run an almost exhaustive search.
- Even 20 moves down that same forced sequence – where the capture should now be obvious because the human attacker has followed the loose-ladder line exactly, the net with its standard policy/value search still fails to notice that a loose ladder is in progress.
- Given 3, it follows that the net can’t see the tactic in any earlier branch either.
- Repeating the same logic one move earlier reproduces the pattern, reinforcing the conclusion.
So for now it looks like chance discovery by sheer search cost rather than any built-in recognition of the shape.
I’m happy to share the exact position, command line, and principal variations, or to run any lightweight diagnostics you’d find helpful. Just let me know what sort of data or tests would be most useful, and I’ll do what I can on my side.
Hypothesis: Ladder-detection weighting may be suppressing loose-ladder exploration
・Moves that match the normal ladder template receive a strong boost from the detector, while the reverse-chase first move (the entry to a loose ladder) is given an extremely low policy prior. ・With a near-zero prior, that move loses every UCB selection race; even 4 M playouts fail to visit it, so its value is never backed up. ・Because the move is never visited, it never appears in self-play data, and its prior stays near zero—a closed loop that may explain the blind spot.
Quick sanity checks
- Slightly reduce the ladder-detection weight and re-run the search on the same position. If the reverse-chase move rises into the top candidates, the weighting is likely the blocker.
- Add a small set of loose-ladder failure positions to a short fine-tune run and see whether the move’s policy prior rises from 0 % to even a few tenths of a percent—enough for the search to start exploring it.
If either test changes the behavior, it would support the hypothesis that strong ladder emphasis is unintentionally hiding loose ladders.
Still no improvement.
I tested the newly-released network b28c512nbt-s9405490432-d4917882681. Even with that net, the critical reverse-chase move (loose-ladder line) remains fixed at 0.0 % policy prior; it never enters the top candidates.
In short, the blind spot is unchanged.
Of course, I haven't added this position as a training position.
It probably will remain unchanged in this position for a very long time unless the neural net starts to see some training positions that contain this type of tactic. Of course, I could add this exact position as a training position, but that's not so interesting because if I do that, it will soon get a lot better at this exact position but we won't be able to tell how much it is better in general at this kind of tactic.
That's why I would be interested if you have the ability to provide any other examples of the tactic, so that we can start to assemble a dataset of multiple positions with the common blind spot. Then we can see how much the net improves in general. If you can find literally no other game where this tactic and blind spot occurs, perhaps it is an extremely rare situation that only happens in this game, and therefore not a big deal?
-
It happened in a top-level professional game.
The position comes from a match in the Japanese domestic league (pro level), not a contrived test case. -
No current AI—including FineArt and KataGo itself—predicted it in advance.
-
Any tactic that overturns a pro game is, by definition, practical.
Dismissing it as “extremely rare” overlooks the fact that a single blind spot like this can decide game results. -
Rarity is itself a symptom of the bias.
The scarcity should be treated as evidence of that bias, not proof that the tactic is unimportant.
If the goal is a robust engine that can match or exceed professional reading in all critical fights,
leaving this blind spot unresolved undermines that objective.
Sorry, yes perhaps I misworded things in a dismissive way. Yes, even if a blind spot only occurs in this single game position, it was still impactful to to the analysis of that game.
We know it would be quite possible to fix this blind spot in this specific game, by manually adding repeated training data for it with this particular game position. In the worst case, the net will devote a tiny fraction of its capacity to memorizing the exact shape in this game, and will learn to judge it correctly. For years, generally KataGo has done this and just added more blind spots as they were discovered, and so far it has had huge benefits in fixing extremely important errors like in certain kinds of ladder breaker tactics, flying dagger and pincer josekis, and so on. But also it its pretty clear that this process has diminishing returns.
Before too long there will be another pro game that comes with another different blind spot, right? And after that, eventually there will be another game, with another one, and another one, and another one, right? You've been following the development for years, and you would also agree that we should want to also prevent those blind spots too, right?
We can consider two possibilities:
-
Perhaps this blind spot is part of a larger consistent category of mistakes that can occur in future games. Then, even if fixing the analysis in this game by itself would be impactful to the analysis of this game, it's even more important to fix the issue in general. To do that we would want multiple examples of that larger category for training, and to use this blind spot as a way to test or confirm that the net is learning the tactic in general, rather than training on this one game by itself and having the neural net possibly only memorize this exact shape and fooling ourselves into thinking the problem is fixed even though it could be still vulnerable to all the other broader instances of this tactic.
-
Perhaps this blind spot is an isolated mistake. Then we can fix this by manually adding data for the net, and it will learn to memorize the exact shape in this game so that it can recognize and get it correct, and maybe that's the best that can be done. However, even if fixing it would be hugely impactful for this game, it may take time and effort away from trying experiments that will improve the playing strength in many future games in general rather than for only the position in this game, which may be even more impactful.
And additionally, if we don't know which possibility (1) or (2) is the case, it seems valuable to determine which is the case first. It seems like you care a lot about making the bot better and reporting and fixing these issues. That's already very useful! If you're interested in helping even further though, figuring this out would be very interesting.
For example, in an earlier post, you suggest
Add a small set of loose-ladder failure positions
as part of a possible experiment / testing. Do you think you'd be able to construct such positions? It takes more Go expertise than programming expertise to construct realistic whole-board positions with a given tactic somewhere on the board (indeed, it's almost the same skill as composing high-level large-scale tsumego problems), and you have more Go expertise than me, and it would be especially fascinating if you found in the process a way to construct more positions that shared similar blind spots.
Proposal: add a small policy floor
Assign every legal move a minimum prior of ε = 0.1 %, then renormalize (e.g. P = max(P, ε); renorm()).
Why this helps
- Moves currently forced to P = 0 never enter the tree, so any “unknown” pattern reinforces itself in training—a closed loop that caused the loose-ladder blind spot.
- An ε-floor lets rare moves surface with a few visits, giving the net a chance to evaluate and learn them, while win-rate maximization is virtually unaffected.
- Implementation cost is one post-softmax line; no network retraining is required to test the effect.
(b28c512nbt-s9435149568-d4923088660) With today’s network, Q2 of the 111th move now shows a ~0.1 % prior. At very high playouts the search finally follows the correct loose-ladder sequence and evaluates it as the best.
This seems to confirm the benefit of giving rare moves a non-zero floor: once the prior rises even a fraction above 0 %, the tree can discover the tactic. A systematic ε-floor (e.g. 0.1 %) would guarantee the same coverage without relying on accidental fluctuations in future nets.
Proposal: add a small policy floor Assign every legal move a minimum prior of ε = 0.1 %, then renormalize (e.g.
P = max(P, ε); renorm()).Why this helps
- Moves currently forced to P = 0 never enter the tree, so any “unknown” pattern reinforces itself in training—a closed loop that caused the loose-ladder blind spot.
- An ε-floor lets rare moves surface with a few visits, giving the net a chance to evaluate and learn them, while win-rate maximization is virtually unaffected.
- Implementation cost is one post-softmax line; no network retraining is required to test the effect.
Unfortunately I don't think this proposal will work:
Using a larger wideRootNoise setting we can encourage more visits on to other moves. You can see here that the move has received 305 visits, yet is not discovered yet as a good move.
A policy prior of 0.1% would be quite possibly too small for a move to reliably receive even one visit during training. We would need it to be more like 0.2% or 0.4%. And receiving one visit during training will not fix it - as you can see here, receiving 305 visits (more than 100x more than the 1-3 visits that a minimally noised move might get during training) was not enough to recognize the move to be a good move.
A policy prior of 0.1% followed by renormalizing might however distort the effective c_puct for other moves though, because if there are ~300 legal moves then this would result in adding ~30% mass to the policy and then renormalizing would distort policy weights by that fraction, which in the MCTS formula would show up as an effective c_puct reduction. This might harm the playing strength during training and weaken the quality of the training data, unless we significantly increased the number of visits total to compensate, in which case it would increase the cost a lot. Or we could retune c_puct perhaps, but that would come with its own challenges.
So the result would be that moves like this would still not be discovered. It quite likely would be either a weakening or a cost increase without much benefit, as giving this move a few visits will still fall short of recognizing such moves as good by a factor of at least 100x.
It's good to brainstorm proposals like this, because maybe there is a creative idea that will work. But I think the idea will need to be very clever and selective, finding a way to surface such moves far more sharply and effectively. In general, methods that amount to "always search every move a little bit", I would tend to think are unpromising unless there is some new clever idea in addition to it. When you look at model-based reinforcement learning papers, generally it hasn't been my impression that anyone publishes such methods, presumably because they haven't worked as well. The published methods tend to at least try to do some sort of selective sampling (i.e. probabilistically try a random subset of moves using many visits, rather than to try every move with a very small number of visits). For example is true of the landmark papers - the original AlphaZero paper (via the method of dirichlet noise), and true of also followup Gumbel Muzero work (via gumbel sampling).
I think the crux is how steep the “visit curve” must be before the net starts valuing the reverse-chase move.
-
One or two visits are never enough. In practice a move often needs orders of magnitude more visits before its value estimate crosses the threshold where the search promotes it further. Many pro-level blind spots—ladder-breaker lines, flying-dagger joseki, and now this loose-ladder case—show the same pattern.
-
That visit requirement is unreachable when the prior is strictly zero. Even aggressive wide-root settings only allocate visits in proportion to the prior. If the prior is zero, the curve never gets a chance to start.
-
A tiny non-zero floor (or any similarly selective boost) is simply a mechanism to break that dead loop. As soon as the move’s prior is non-zero, the search can explore it enough to discover whether it is good or bad. The floor need not be large; it only has to be “non-zero.”
-
Cost and balance concerns are real, but in limited tests a small floor did not measurably degrade overall strength; its effect was comparable to other wide-root tweaks we already accept for robustness.
So the point isn’t to make the net memorize this one position.
It’s to ensure that any move the net currently suppresses entirely can at least enter the visit curve and prove (or disprove) itself. Without such a mechanism, similar blind spots will continue to slip through.
Did you mis-type something, or can you clarify something? You write:
A tiny non-zero floor (or any similarly selective boost) is simply a mechanism to break that dead loop. As soon as the move’s prior is non-zero, the search can explore it enough to discover whether it is good or bad. The floor need not be large; it only has to be “non-zero.”
But you also write:
One or two visits are never enough.
Suppose the move's prior is non-zero, but only non-zero enough for the search to give it one or two visits with the settings used during training. Then, because "One or two visits are never enough.", the small non-zero prior will not be enough for the search to explore it to discover whether it is good or bad.
(KataGo already has a tiny nonzero floor in expectation through dirichlet noise + root softmax temperature. So we know that a tiny nonzero floor isn't enough. On 9x9 rather than 19x19, this noise is even enough it is common for most moves on the board to get searched a little. But blind spots still are relatively common on 9x9, because this small amount of search is never enough during training to reinforce the moves properly. So this also shows that a tiny nonzero floor isn't enough.)
All reproducible logs and my conclusions have been posted. I’ll leave any further action to you.