Upgrade TTS OS from 3.34.2 to 3.43.3 issues
Summary
Using the OS docker-compose.yml that specifies latest for Postgres, Redis and TTS, an attempt at an upgrade goes wrong with Postgres as that pulls v18. Fixed that. Then 3.43.3 can't connect.
Also, the docs have docker compose up/pull/whatever rather than docker-compose. Makes copy & pasta a PitA
Steps to Reproduce
docker-compose pull docker-compose up
Current Result
Migration of NS and IS appears to work
The stack starts but shows connection errors in the logs for the gateway that's connected to that instance and login is not possible.
Expected Result
Connect to new database, all working
Relevant Logs
Can reproduce once I've got the tests back on track
URL
No response
Deployment
The Things Stack Open Source (self-hosted)
The Things Stack Version
3.34.2
Client Name and Version
Other Information
Using TTS OS to test utility apps I create for client. These can mess stuff up if/when I get the API calls wrong!
Proposed Fix
No response
Contributing
- [x] I can help by doing more research.
- [ ] I can help by implementing a fix after the proposal above is approved.
- [x] I can help by testing the fix before it's released.
Validation
- [ ] The fix is tested in a staging environment.
- [ ] The fix is documented in The Things Stack Documentation
Code of Conduct
- [x] I agree to follow TTN's Community Code of Conduct.
@nicholaspcr: Do we support Postgres v18? I believe so but check. If not, we need to pin the Postgres version in the docs.
We should be compatible with PostgreSQL 18, as we are only using standard features which have not been deprecated.
This seems to be the case where we might want to ask for more information, such as logs of the postgresql initialization.
Also, the docs have docker compose up/pull/whatever rather than docker-compose. Makes copy & pasta a PitA
docker compose is the V2 which should be reference as it is the most recent one.
Migration of NS and IS appears to work
The stack starts but shows connection errors in the logs for the gateway that's connected to that instance and login is not possible.
This leads to believe that the connection configuration is wrong but I believe that would not change from a postgreSQL upgrade.
The overall conclusion is that this issue just lacks information.
@HeadBoffin if it is possible, could you provide more information in regards to what are the steps that you are taking that leads you to this bug?
I have a Ubuntu 24.04.3 LTS (GNU/Linux 6.8.0-86-generic x86_64) server with default apt repositories using docker.io from the mainstream repro. This does not automagically provide v2 of compose.
It is a totally vanilla install just changing the url/addresses to the one for the server and generating keys / passwords etc. All paths and ports remain as per the generic yaml's.
We should be compatible with PostgreSQL 18, as we are only using standard features which have not been deprecated.
The log reports a driver / connection issue between the stack and Postgres that goes away when I roll back to PG v17. It doesn't appear to get as far as trying to use any features, standard or otherwise.
This leads to believe that the connection configuration is wrong but I believe that would not change from a postgreSQL upgrade.
I ONLY did a PULL, no settings were changed. When I rolled back by explicitly referencing the images for 17.6 for PG and 3.34.2 for the stack in docker-compose.yml, and it worked.
BTW, the move to PG v18 requires a change to the volumes setting to get PG v18 to startup / find the database. But that still doesn't get the stack to connect to it.
if it is possible, could you provide more information in regards to what are the steps that you are taking that leads you to this bug?
It was literally doing a pull and then an up - as in ssh in to the server, backup per the TTS docs, type the commands as detailed above, look at the output, groan, hack about, find other similar reports on the inter webs, roll back to PG v17, general mayhem, rollback to TTS v3.34.2, it works, crack on with project.
As I'm very close behind Ben on the famous 10 minute install, when I've a minute (or 20), I'll do a fresh install on another server. I'll race you!
But I'm not in a position to upset the test instance as a client is currently using it for pre-deployment testing. That's because I won't let them use it on a TTI instance / TTN until its passed engineering / code-review / UI / functionality tests, because that's how I roll.