Draft -- PostgreSQL db replaces Oracle db

Open zimzoom opened this issue 2 years ago • 0 comments

This draft PR addresses 2 pending questions: 1) how much work would still remain to move CMR from Oracle to Postgres, following the datastore prototype? (wiki report here) & 2) how should we approach refactoring CMR to allow for running with an external datastore that is different from the official one? (discussion wiki here)

"Why are there still oracle namespaces here?" -- These changes have been made 'in place' as much as possible, except for certain Postgres core aspects, mostly to help address question 2 -- much "oracle" code contains general CMR logic for working with external data stores in general (health check, etc.) or at least RDBMS (jdbc utils), and conversely much of CMR code is actually specific to Oracle (explicit SQL, assumption of receiving blob instead of bytes, etc.)

What's been done so far

DB connection in same style, with connection pool and db-spec. (used in prototype and for setup db)
cmr setup db -- creation of all users, schemas, data migrations have been converted from Oracle to Postgres
Job Scheduler -- the Java dependencies has been translated from Oracle to Postgres, successfully creates most jobs
CRUD functionality brought over from prototype work -- confirmed for providers, collections, granules [1]

What remains to be done

(EDIT: solved apparently) ~~FIRST PRIORITY BUG -- proper start of DB system component [2] -- (hard)~~
cmr setup db -- the tear-down of users and migrations (e.g. (drop), (drop-user)) etc. (easy)
debug some scheduled jobs still fail -- might be helped by solving First Priority Bug above.
test bulk update/bulk migration/catalog rest code (dev already started)
add timezone-aware functionality to db-timestamp->string
debug the find-one sql utility -- currently functional due to redundant first but needs to limit results in the actual sql to reduce load on db.
connection pool improvement: add failover and ons functionality as in Oracle, add manual close inside PostgresStore stop
misc. DB utilities -- changing of Oracle-specific property names, values etc. to Postgres e.g. DB health checks, ignore-existing-error wrapper, etc.
(optional, recommended) - implement AWS Aurora bells and whistles to take advantage of improved monitoring, performance, failover, etc.
(optional, recommended) - rename namespaces to just 'postgres' instead of 'aurora-postgres', to future proof, and decrease confusion as this is all specific to Postgres engine [3]

Misc. dev notes

Columns that are used to indicate true/false such as 'deleted', etc. have been changed from being type INTEGER (1/0) or VARCHAR(1) ('1'/'0') to being type BOOLEAN. Oracle did not support the boolean data type and sometimes required manual conversion; Postgres does, will convert '1'/'0' to true/false automatically (but won't convert true/false to a char/int, which is how Job Scheduler was working). Most all CMR data objects are using true/false.
The temp table created in metadata-db-app migration no.2 does not persist in Postgres as it does in Oracle. Manual table creation (aurora/create-temp-table) has been added where it is used.
Postgres uses BYTEA data type instead of Oracle's BLOB; therefore where these columns are parsed, gzip-blob->string becomes gzip-bytes->string
CMR's jdbc and clj-time libs are currently marked as deprecated/not actively maintained
You may see this in your REPL after a while, will then need to do full restart, still need to debug: org.postgresql.util.ServerErrorMessage "FATAL: remaining connection slots are reserved for non-replication superuser connections"

How to run this locally

Much the same as using the Oracle container. Make sure your profiles.clj content is set as environment variables. Run cmr setup dev. Launch a container using the official postgres image with these configs: (you can add a volume if you want)

docker run \
  --name cmr-postgres-db \
  -e POSTGRES_PASSWORD=admin \
  -e POSTGRES_DB=cmrcdb \
  -d \
  -p 5432:5432 postgres

After the container is 'running', run cmr setup db to create users, schemas, migrations. Upon entering REPL, run (reset :db :external).

Footnotes

[0] access control app's (dev-start)'s bootstrap is commented out not just because it errors due to connection error, also because I have an M1 chip and the new Oracle ARM image does not work with this function either so I couldn't even compare for debugging. Also this function might be removed in a different current ticket.
[1] search isn't confirmed locally because again, having M1 chip search is broken for my Oracle container as well so I can't compare for debug. Search was confirmed on cloud in prototype work.
[2] (EDIT: solved) ~~The expected way that a CMR service launches its DB within its create-system is not working properly here. It appears the lifecycle start doesn't execute, because it returns having :datasource nil. Note that the PostgresStore record itself does successfully start when used in an atom instead of component, as in setup db. This atom method has been added to the needed create-system's as a WORKAROUND ONLY.~~
[3] right now the JDBC driver is the only 'Aurora' thing, but it should be just a wrapper on the Postgres JDBC driver. The connection pool lib, HikariCP, is recommended by AWS Aurora JDBC Wrapper docs, but the quartzite Job Scheduler uses C3P0.
- This driver will throw 'WARNING: Expecting a dialect that supports a cluster topology' when run locally, that's normal.

Aug 17 '23 14:08 zimzoom