amazon-textract-response-parser icon indicating copy to clipboard operation
amazon-textract-response-parser copied to clipboard

List of relationship.ids should likely be a set.

Open cmyers009 opened this issue 2 years ago • 0 comments

The list of relationship.ids contain no duplicates. On the line relationship.ids.extend(x for x in ids if x not in relationship.ids), you must iterate through the entire list to make sure there are no duplicates before you append the item to the list. If you use python's built in set datatype, you can accomplish the same behavior with a o(1) time per item adding to the list instead of o(n) with n=length of relationship.ids.

https://github.com/aws-samples/amazon-textract-response-parser/blob/dd1ce01d5c63b394af26510d7df72d58e80d136c/src-python/trp/trp2.py#LL354C85-L354C85

def add_ids_to_relationships(self, ids: List[str], relationships_type: str = "CHILD"):
    """Only adds id if not already existing"""
    relationship = self.get_relationships_for_type(relationship_type=relationships_type)
    if relationship:
        if not relationship.ids:
            relationship.ids = list()
            relationship.ids.extend(ids)
        else:
            relationship.ids.extend(x for x in ids if x not in relationship.ids)
    else:
        # empty, set base
        if not self.relationships:
            self.relationships = list()
        self.relationships.append(TRelationship(type=relationships_type, ids=ids))
        
        
   

cmyers009 avatar May 04 '23 14:05 cmyers009