clp icon indicating copy to clipboard operation
clp copied to clipboard

ffi: Add support for auto/user-generated key-value pairs and fix duplicated key detection issues on non-leaf nodes in `KeyValuePairLogEvent`.

Open LinZhihao-723 opened this issue 1 year ago • 1 comments

Description

As planned in #556, we want to differentiate key-value pairs into two categories: auto-generated key-value pairs and user-generated key-value pairs. This PR adds support for auto/user-generated key-value pairs in KeyValuePairLogEvent, where each set of key-value pairs is stored as a set of node-ID-value pairs with a reference to the associated schema tree. However, this PR does not yet support auto/user-generated kv pairs serialization/deserialization. The deserializer will output new KeyValuePairLogEvent instances, but with all kv-pairs as user-generated ones, leaving auto-generated pairs empty.

This PR also fixes a bug where duplicated keys are not correctly detected. Consider the following schema tree:

 <0:root:Obj>
      |
      |------------> <1:a:Obj>
      |                  |
      |--> <2:a:Str>     |--> <3:b:Str>

Node annotation: <nodeID:keyName:valueType>

Before this PR, if we have node-ID-value pairs [<2:"Value0">, <3:"Value1">], it will be considered valid with the schema tree above. However, it is actually invalid since there is an implicit duplicated key under the root node: both node 1 and node 2 have the key `"a". The previous implementation doesn't check key duplications among non-leaf nodes' siblings. This PR fixes this bug by checking key duplications among siblings for all the nodes from the leaf up to the root.

Validation performed

  • Ensure all workflow passed.
  • Ensure existing unit tests all passed with the new KeyValuePairLogEvent that supports auto/user-generated kv pairs.
  • Enhance the current unit tests to verify:
    • Valid inputs can successfully construct an instance of KeyValuePairLogEvent
    • Invalid auto-generated or user-generated node-ID-value pair inputs will be properly captured
    • Both auto-generated and user-generated kv pairs can be correctly serialized
    • The bug mentioned in the Description section is correctly fixed

Summary by CodeRabbit

  • New Features

    • Enhanced serialization of key-value pairs into JSON format.
    • Added validation for uniqueness of node keys among siblings.
    • Improved handling of auto-generated and user-generated schema trees.
  • Bug Fixes

    • Resolved issues with key duplication checks in ancestor nodes.
  • Tests

    • Updated test cases to validate new functionalities and error handling for log event creation.

LinZhihao-723 avatar Oct 14 '24 03:10 LinZhihao-723

Walkthrough

The pull request introduces significant changes to the KeyValuePairLogEvent class and related components to enhance the handling of key-value pairs, specifically in terms of serialization and validation. Key updates include the addition of new methods for managing auto-generated and user-generated schema trees, modifications to existing methods to accommodate new parameters, and updates to the test files to ensure proper functionality and validation of these changes.

Changes

File Change Summary
components/core/src/clp/ffi/KeyValuePairLogEvent.cpp - Added methods: serialize_node_id_value_pairs_to_json, check_key_uniqueness_among_sibling_nodes, get_auto_gen_keys_schema_subtree_bitmap, and get_user_gen_keys_schema_subtree_bitmap.
- Updated create and serialize_to_json methods for new parameters and return types.
components/core/src/clp/ffi/KeyValuePairLogEvent.hpp - Updated constructor and create method to accept two schema trees and two collections of node-ID value pairs.
- Renamed methods to distinguish between auto-generated and user-generated pairs.
- Updated documentation comments.
components/core/tests/test-ffi_KeyValuePairLogEvent.cpp - Added assert_kv_pair_log_event_creation_failure method.
- Updated test cases for JSON serialization and key duplication checks.
components/core/src/clp/ffi/ir_stream/Deserializer.hpp - Added member variables: m_auto_gen_keys_schema_tree and m_user_gen_keys_schema_tree.
- Removed the original m_schema_tree variable.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp - Updated deserialize_ir_unit_kv_pair_log_event function to accept two schema trees instead of one.
components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp - Updated the signature of deserialize_ir_unit_kv_pair_log_event to reflect new parameters.
components/core/tests/test-ir_encoding_methods.cpp - Enhanced serialization and deserialization functions to handle log events with timestamps and UTC offsets.
- Expanded test cases for various scenarios, including protocol version validation and edge cases in timestamp serialization.
components/core/src/clp/ffi/SchemaTree.hpp - Added equality operators for Node and SchemaTree classes.
components/core/tests/test-ffi_IrUnitHandlerInterface.cpp - Updated KeyValuePairLogEvent::create method call to accept two schema trees.
- Adjusted assertions to reflect changes in expected data structure.

Possibly related PRs

  • #540: The changes in the main PR regarding the KeyValuePairLogEvent class and its methods for handling key-value pairs are related to the updates in the IrUnitHandlerInterface, which also deals with handling log events, indicating a connection in functionality.
  • #573: The updates to the IR stream protocol version handling in this PR are relevant as they involve the deserialization of key-value pair log events, which is a core aspect of the main PR's changes.
  • #581: The refactoring of the get_schema_subtree_bitmap method in the KeyValuePairLogEvent class directly relates to the changes made in this PR, which also focuses on enhancing the functionality of the KeyValuePairLogEvent class.
  • #557: The introduction of support for serializing and deserializing schema tree node IDs is relevant as it aligns with the main PR's focus on improving the handling of key-value pairs and their associated schema trees.
  • #599: The refactoring and modernization of the streaming compression interface may indirectly relate to the overall improvements in the handling of data structures, including key-value pairs, as part of the broader codebase enhancements.

Suggested reviewers

  • kirkrodrigues
  • gibber9809

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Oct 14 '24 03:10 coderabbitai[bot]

Can you resolve the conflicts?

kirkrodrigues avatar Nov 13 '24 12:11 kirkrodrigues