pgsync icon indicating copy to clipboard operation
pgsync copied to clipboard

Cannot assign transformation for children columns of type - nested.

Open cedzz opened this issue 4 years ago • 13 comments

PGSync version: 2.0.0 Postgres version: 13.3 Elasticsearch version: 7.4.0 Redis version: 6.2.3 Python version: 3.9.2

Problem Description: I am trying to set explicit mapping for columns that are of children node. The children node is of type "nested". I have transformed the children node at root level. After transformation, explicit mappings are not working.

Eg: Consider example in docs -

[
  {
    "database": "book",
    "index": "book",
    "nodes": {
      "table": "book",
      "children": [
        {
          "table": "author",
          "columns": [
            "id",
            "name"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_many"
          },
          "transform": {
            "mapping": {
              "name": {
                "type": "keyword"
              }
            }
          }
        }
      ],
      "transform": {
        "mapping": {
          "author": {
            "type": "nested"
          }
        }
      }
    }
  }
]

If I remove the root level transformation on author of type "nested", internal mapping for "author.name" works. I hope I am using the right syntax to define the schema. I couldn't find any examples of using "nested" type in docs.

cedzz avatar Jul 05 '21 18:07 cedzz

I will add some documentation for this. A mapping basically has a field and a type. The field should be one of the columns in the current node.

e.g

"transform": {
     "mapping": {
         "myfield": {
             "type": "keyword"
        }
    }
}

This means you want myfield to be of type keyword. So you can only apply mappings to fields.

In the example provided, author is a table

toluaina avatar Jul 06 '21 21:07 toluaina

Thanks for the speedy reply. Applying type to author as 'nested' is working, as 'nested' is applied on children documents rather than fields. Reference - https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

cedzz avatar Jul 07 '21 07:07 cedzz

Sorry about the delay. Do you have a working example with pgsync schema.json and database schema? A transform node should be attached to a node you want to perform the operation on. e.g

[
    {
        "database": "book",
        "index": "book",
        "nodes": {
            "table": "book",
            "transform": {
                "mapping": {
                    "isbn": {
                        "type": "nested"
                    }
                }
            }
        }
    }
]

So the author transform in your example need to be within the author node/table

toluaina avatar Jul 14 '21 20:07 toluaina

any update on this?

toluaina avatar Jul 21 '21 13:07 toluaina

@toluaina Apologies for the late reply, had to take care of other engagements. I tried putting the author transformation within the author table. It didn't work as mapping generated for author encapsulates author of object type.

So, if I add author within the author node/table as you shared, the mapping will look like this -

[
  {
    "database": "book",
    "index": "book",
    "nodes": {
      "table": "book",
      "children": [
        {
          "table": "author",
          "columns": [
            "id",
            "name"
          ],
          "relationship": {
            "variant": "object",
            "type": "one_to_many"
          },
          "transform": {
            "mapping": {
              "name": {
                "type": "keyword"
              }, 
              "author": {
                 "type": "nested"
              }
            }
          }
        }
      ]
    }
  }
]

It will return something like this -

{
  "book" : {
    "mappings" : {
      "properties" : {
        "author" : {
          "properties" : {
            "author" : {
              "type" : "nested"
            },
            "price" : {
              "type" : "double"
            }
          }
        }
      }
    }
  }
}

What I wanted was this -

{
  "book" : {
    "mappings" : {
      "properties" : {
        "author" : {
          "type" : "nested",
          "properties" : {
            "price" : {
              "type" : "double"
            }
          }
        }
      }
    }
  }
}

Please note, if I remove transformation on price from the author node, and add transformation on root level for author of type nested, it works. It doesn't work only if I add the price transformation.

As a workaround for now, I have manually updated the index mapping after bootstrapping via pgsync.

cedzz avatar Jul 28 '21 17:07 cedzz

@toluaina Any updates on this? Is there a workaround to create a child table with the type "nested" like @cedzz mentioned above?

joshuafernandes1996 avatar Nov 14 '21 08:11 joshuafernandes1996

I feel this would entail a bit of re-working and introduce too much complexity. I can add a option to the schema.json to specify exact mapping you require and child mapping will be ignored? So you would add your own mapping as you would expect in the schema

toluaina avatar Nov 14 '21 20:11 toluaina

I feel this would entail a bit of re-working and introduce too much complexity. I can add a option to the schema.json to specify exact mapping you require and child mapping will be ignored? So you would add your own mapping as you would expect in the schema

Hi @toluaina , yes please, if I could add my own mapping that would help. I am working with generic search pattern, so it looks for nested objects, with its path and key. So I really need to specify child table as nested, otherwise the search fails.

joshuafernandes1996 avatar Nov 16 '21 21:11 joshuafernandes1996

@toluaina any update on this? option to specify exact mapping and ignore child mapping.

nkblitz avatar Nov 22 '21 11:11 nkblitz

I added an option to allow you to specify your own mapping in the schema.json. Here is an example

toluaina avatar Dec 02 '21 13:12 toluaina

@toluaina , I tried putting all my mapping under the "mapping" property liked you mentioned. So under nodes I am only mentioning the columns, child tables and relationships. But it not identifying my "analyzers" even though I have put it under the "settings" property.

[
  {
    "database": "postgres",
    "index": "customers",
    "settings": {
      "analysis": {
        "analyzer": {
          "analyzer_keyword": {
            "tokenizer": "whitespace",
            "filter": [
              "trim",
              "lowercase"
            ]
          },
          "case_insensitive_analyzer": {
            "type": "custom",
            "filter": [
              "lowercase"
            ],
            "tokenizer": "keyword"
          }
        },
        "normalizer": {
          "sorting": {
            "type": "custom",
            "char_filter": [],
            "filter": [
              "lowercase",
              "asciifolding"
            ]
          }
        }
      }
    },
    "mapping": {
      "Id": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "CreatedDateTime": {
        "type": "date",
        "format": "dd-MM-yyyy HH:mm:ss||epoch_millis",
        "fields": {
          "raw": {
            "type": "date"
          }
        }
      },
      "UpdatedDateTime": {
        "type": "date",
        "format": "dd-MM-yyyy HH:mm:ss||epoch_millis",
        "fields": {
          "raw": {
            "type": "date"
          }
        }
      },
      "FirstName": {
        "type": "text",
        "analyzer": "analyzer_keyword",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "LastName": {
        "type": "text",
        "analyzer": "analyzer_keyword",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "Name": {
        "type": "text",
        "analyzer": "case_insensitive_analyzer",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "TaxExemptId": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "DateOfBirth": {
        "type": "text",
        "analyzer": "analyzer_keyword",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "CustomerType": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "NotificationPreferences": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "Status": {
        "type": "text",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "CompanyName": {
        "type": "text",
        "analyzer": "analyzer_keyword",
        "fields": {
          "raw": {
            "type": "keyword",
            "normalizer": "sorting"
          }
        }
      },
      "CustomerAddress": {
        "type": "nested",
        "properties": {
          "Id": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "ZipCode": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "AddressType": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Street": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Street2": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "City": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "State": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Country": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          }
        }
      },
      "CustomerContact": {
        "type": "nested",
        "properties": {
          "Id": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "ContactType": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "ContactDetail": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          }
        }
      },
      "AlternateContactDetails": {
        "type": "nested",
        "properties": {
          "Id": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "FirstName": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "LastName": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Email": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Phone": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          }
        }
      },
      "EquipmentInfo": {
        "type": "nested",
        "properties": {
          "EquipmentId": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Title": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Year": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Make": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          },
          "Model": {
            "type": "text",
            "analyzer": "analyzer_keyword",
            "fields": {
              "raw": {
                "type": "keyword",
                "normalizer": "sorting"
              }
            }
          }
        }
      }
    },
    "nodes": {
      "table": "Customers",
      "schema": "public",
      "columns": [
        "Id",
        "FirstName",
        "LastName",
        "CompanyName",
        "Name",
        "CreatedDateTime",
        "UpdatedDateTime",
        "DateOfBirth",
        "IsTaxExempt",
        "TaxExemptId",
        "DoNotMail",
        "DoNotService",
        "CustomerType",
        "NotificationPreferences",
        "Status",
        "LastActivityDate",
        "IsDeleted"
      ],
      "children": [
        {
          "table": "CustomerAddress",
          "columns": [
            "Id",
            "CustomerId",
            "Name",
            "AddressType",
            "Street",
            "Street2",
            "City",
            "State",
            "Country",
            "Latitude",
            "Longitude",
            "ZipCode",
            "IsDefault",
            "IsDeleted",
            "IsLtr",
            "IsVerified"
          ],
          "label": "CustomerAddress",
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "foreign_key": {
              "child": [
                "CustomerId"
              ],
              "parent": [
                "Id"
              ]
            }
          }
        },
        {
          "table": "CustomerContacts",
          "columns": [
            "Id",
            "CustomerId",
            "IsPrimary",
            "AllowVoice",
            "ContactDetail",
            "AllowMail",
            "AllowText",
            "ContactType",
            "IsDeleted"
          ],
          "label": "CustomerContact",
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "foreign_key": {
              "child": [
                "CustomerId"
              ],
              "parent": [
                "Id"
              ]
            }
          }
        },
        {
          "table": "AlternateContactDetails",
          "columns": [
            "Id",
            "CustomerId",
            "FirstName",
            "LastName",
            "Description",
            "Email",
            "Phone",
            "IsDeleted"
          ],
          "label": "AlternateContactDetails",
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "foreign_key": {
              "child": [
                "CustomerId"
              ],
              "parent": [
                "Id"
              ]
            }
          }
        },
        {
          "table": "EquipmentInfo",
          "columns": [
            "EquipmentId",
            "CustomerId",
            "Title",
            "Year",
            "Make",
            "Model",
            "CreatedOn",
            "UpdatedOn"
          ],
          "label": "EquipmentInfo",
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "foreign_key": {
              "child": [
                "CustomerId"
              ],
              "parent": [
                "Id"
              ]
            }
          }
        },
        {
          "table": "Notes",
          "columns": [
            "Id",
            "CustomerId",
            "NoteType",
            "Note",
            "IsDeleted",
            "CreatedBy",
            "CreatedOn",
            "UpdatedBy",
            "UpdatedOn"
          ],
          "label": "Notes",
          "relationship": {
            "variant": "object",
            "type": "one_to_many",
            "foreign_key": {
              "child": [
                "CustomerId"
              ],
              "parent": [
                "Id"
              ]
            }
          }
        }
      ]
    }
  }
]

joshuafernandes1996 avatar Dec 11 '21 16:12 joshuafernandes1996

@joshuafernandes1996 Your key is named settings and should be named setting.

voyc-jean avatar Jan 13 '22 12:01 voyc-jean

Well spotted @voyc-jean Thanks for that. Yes it should be setting singular

toluaina avatar Jan 27 '22 10:01 toluaina