OpenMetadata icon indicating copy to clipboard operation
OpenMetadata copied to clipboard

Recursion issue when parsing advanced Avro Schemas

Open mgorsk1 opened this issue 1 year ago • 1 comments

Affected module Ingestion

Describe the bug

When parsing more complex avro schema, where we have object 'Item' (from example below) which can have property 'itemList', which is additional list of 'Item' objects, this will cause infinite recursion issue and failure of whole ingestion.

Ideally such schema should be parsed properly however upon RecursionException it would be acceptable to skip it. It is not the case and whole Python process exists with SIGABORT signal - the suspected culprit here is C library used for avro parsing.

To Reproduce

from metadata.parsers.avro_parser import parse_avro_schema


data = """
{
  "type": "record",
  "name": "RecursionIssue",
  "namespace": "com.issue.recursion",
  "doc": "Schema with recursion issue",
  "fields": [
    {
      "name": "issue",
      "type": {
        "type": "record",
        "name": "Issue",
        "doc": "Global Schema Name",
        "fields": [
          {
            "name": "itemList",
            "default": null,
            "type": [
              "null",
              {
                "type": "array",
                "items": {
                  "type": "record",
                  "name": "Item",
                  "doc": "Item List  - Array of Sub Schema",
                  "fields": [
                    {
                      "name": "itemList",
                      "type": [
                        "null",
                        {
                          "type": "array",
                          "items": "Item"
                        }
                      ],
                      "default": null
                    }
                  ]
                }
              }
            ]
          }
        ]
      }
    }
  ]
}

"""
try:
    print(parse_avro_schema(data))
except RecursionError as e:
    print(e.args)

Expected behavior Schema is parsed properly (or Python code doesn't fail with SIGABRT).

Version:

  • OS: [e.g. iOS]
  • Python version: 3.9
  • OpenMetadata version: [e.g. 0.8] 1.3.1
  • OpenMetadata Ingestion package version: [e.g. openmetadata-ingestion[docker]==XYZ] 1.3.1

Additional context This is different from https://github.com/open-metadata/OpenMetadata/issues/13371

mgorsk1 avatar Mar 14 '24 10:03 mgorsk1

I'll be working on this

SumanMaharana avatar May 02 '24 07:05 SumanMaharana

any update on this @SumanMaharana ?

mgorsk1 avatar Jun 23 '24 18:06 mgorsk1

@SumanMaharana did we fix this.

harshach avatar Aug 10 '24 15:08 harshach