OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

openmetadata ingestion and orm profiler needs to be fixed to extract all metadata from z/OS DB2

Open upenbendre opened this issue 3 years ago • 0 comments

While making ingestion work for a z/OS DB2 database, I had to make changes to the reflection.py and base.py of the ibm_db_sa, otherwise it would skip over most of the tables and views in the database. It does not look like the existing ibm_db_sa is tested thoroughly for z/OS DB2. Here are the changes I made: reflection.py in ibm_db_sa 1.) In the declaration sys_columns = Table(“SYSCOLUMNS”, ischema, replace ‘Column(“TYPENAME”, CoerceUnicode, key=“typename”),’ with ‘Column(“COLTYPE”, CoerceUnicode, key=“typename”),’. If you refer to the query generated for get_columns, you will see why this is needed. 2.) In the def get_columns , within the for loop for r in connection.execute(query) in the first statement, add a strip(). coltype = r[1].upper().strip(). The column type from z/OS is retrieved with spaces that need to be removed. 3.) In def get_unique_constraints, replace the sysconst.c.type == ‘U’ with sysconst.c.type != ‘’ OR instead of just the check on ‘U’, check for ‘U’ or ‘P’. These type codes are tricky. I suspect different mainframe types may have different values, so to me, it doesn’t make sense to put a filter on them at this stage in the code. Having said that, the current metadata/ingestion code does not seem to call the get_unique_constraints at all. So that needs to be looked at as well. 4.) Similarly the get_indexes function also does not get invoked by metadata. If that is fixed, then I can test how the for loop ‘for r in connection.execute(query):’ inside the get_indexes function works. If this is not a metadata issue, then it needs more research. 5.) In a different post, I have also mentioned that get_incoming_foreign_keys is also invoked by metadata - and that is for z/OS DB2 as well as Linux DB2. If this is not a metadata issue, then it needs more research. Also in the base.py of ibm_db_sa under the ischema_names dictionary, in addition to ‘TIMESTAMP’: TIMESTAMP, I also added ‘TIMESTMP’: TIMESTAMP,. (TIMESTMP is how the timestamp datatype is spelt in some mainframe installations, where the names have to be 8 characters.

Version: CentOS 7 Python 3.10 OM 0.11.5

  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==XYZ]

I have mentioned orm_profiler also in the Issue description, but haven't tried everything. I would assume it would have similar issues. In some cases the code fix is needed for ibm_db_sa reflection.py / base.py - most of which is identified, and for other issues it may be a change to metadata code itself.

upenbendre avatar Sep 21 '22 07:09 upenbendre