bosh icon indicating copy to clipboard operation
bosh copied to clipboard

bosh-nats-sync failing as long no uaa is available

Open max-soe opened this issue 3 years ago • 0 comments

Describe the bug With the new nats version in bosh 274 we have an issue to deploy bosh. Sometimes the deploy fails with:

Task 12439 | 13:26:49 | L starting jobs: bosh/ad445f92-5fab-4070-bdd4-1071258ba02d (0) (canary)Updating deployment: Expected task '12439' to succeed but state is 'error' (00:08:12)L Error: 'bosh/ad445f92-5fab-4070-bdd4-1071258ba02d (0)' is not running after update. Review logs for failed jobs: health_monitor Task 12439 | 13:32:51 | Error: 'bosh/ad445f92-5fab-4070-bdd4-1071258ba02d (0)' is not running after update. Review logs for failed jobs: health_monitor

We found that the bosh-nats-sync job can not authenticate as long as the codeployed uaa is not running:

[2022-10-13T14:12:56.206749 #647762] INFO : Nats Sync starting... [2022-10-13T14:13:06.290402 #647762] INFO : Executing NATS Users Synchronization [2022-10-13T14:13:06.522845 #647762] ERROR : Failed to obtain token from UAA: #<CF::UAA::BadTarget: error: Failed to open TCP connection to 192.168.1.11:8443 (Connection refused - connect(2) for 192.168.1.11:8443)> [2022-10-13T14:13:06.602752 #647762] FATAL : 401 Unauthorized

So the health-monitor can not use the nats. After the uaa started 5min later everything works fine.

Expected behavior The bosh-nats-sync jobs wait until uaa is started. All jobs that depends on nats like the health_monitor wait until bosh-nats-sync is started.

Versions:

  • Infrastructure: AWS
  • BOSH version 274.4
  • Stemcell version [e.g. ubuntu-jammy/1.18]

max-soe avatar Oct 13 '22 15:10 max-soe