ffi: Add support for auto/user-generated key-value pairs and fix duplicated key detection issues on non-leaf nodes in `KeyValuePairLogEvent`.
Description
As planned in #556, we want to differentiate key-value pairs into two categories: auto-generated key-value pairs and user-generated key-value pairs. This PR adds support for auto/user-generated key-value pairs in KeyValuePairLogEvent, where each set of key-value pairs is stored as a set of node-ID-value pairs with a reference to the associated schema tree.
However, this PR does not yet support auto/user-generated kv pairs serialization/deserialization. The deserializer will output new KeyValuePairLogEvent instances, but with all kv-pairs as user-generated ones, leaving auto-generated pairs empty.
This PR also fixes a bug where duplicated keys are not correctly detected. Consider the following schema tree:
<0:root:Obj>
|
|------------> <1:a:Obj>
| |
|--> <2:a:Str> |--> <3:b:Str>
Node annotation: <nodeID:keyName:valueType>
Before this PR, if we have node-ID-value pairs [<2:"Value0">, <3:"Value1">], it will be considered valid with the schema tree above. However, it is actually invalid since there is an implicit duplicated key under the root node: both node 1 and node 2 have the key `"a". The previous implementation doesn't check key duplications among non-leaf nodes' siblings. This PR fixes this bug by checking key duplications among siblings for all the nodes from the leaf up to the root.
Validation performed
- Ensure all workflow passed.
- Ensure existing unit tests all passed with the new
KeyValuePairLogEventthat supports auto/user-generated kv pairs. - Enhance the current unit tests to verify:
- Valid inputs can successfully construct an instance of
KeyValuePairLogEvent - Invalid auto-generated or user-generated node-ID-value pair inputs will be properly captured
- Both auto-generated and user-generated kv pairs can be correctly serialized
- The bug mentioned in the Description section is correctly fixed
- Valid inputs can successfully construct an instance of
Summary by CodeRabbit
-
New Features
- Enhanced serialization of key-value pairs into JSON format.
- Added validation for uniqueness of node keys among siblings.
- Improved handling of auto-generated and user-generated schema trees.
-
Bug Fixes
- Resolved issues with key duplication checks in ancestor nodes.
-
Tests
- Updated test cases to validate new functionalities and error handling for log event creation.
Walkthrough
The pull request introduces significant changes to the KeyValuePairLogEvent class and related components to enhance the handling of key-value pairs, specifically in terms of serialization and validation. Key updates include the addition of new methods for managing auto-generated and user-generated schema trees, modifications to existing methods to accommodate new parameters, and updates to the test files to ensure proper functionality and validation of these changes.
Changes
| File | Change Summary |
|---|---|
| components/core/src/clp/ffi/KeyValuePairLogEvent.cpp | - Added methods: serialize_node_id_value_pairs_to_json, check_key_uniqueness_among_sibling_nodes, get_auto_gen_keys_schema_subtree_bitmap, and get_user_gen_keys_schema_subtree_bitmap. - Updated create and serialize_to_json methods for new parameters and return types. |
| components/core/src/clp/ffi/KeyValuePairLogEvent.hpp | - Updated constructor and create method to accept two schema trees and two collections of node-ID value pairs. - Renamed methods to distinguish between auto-generated and user-generated pairs. - Updated documentation comments. |
| components/core/tests/test-ffi_KeyValuePairLogEvent.cpp | - Added assert_kv_pair_log_event_creation_failure method. - Updated test cases for JSON serialization and key duplication checks. |
| components/core/src/clp/ffi/ir_stream/Deserializer.hpp | - Added member variables: m_auto_gen_keys_schema_tree and m_user_gen_keys_schema_tree. - Removed the original m_schema_tree variable. |
| components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.cpp | - Updated deserialize_ir_unit_kv_pair_log_event function to accept two schema trees instead of one. |
| components/core/src/clp/ffi/ir_stream/ir_unit_deserialization_methods.hpp | - Updated the signature of deserialize_ir_unit_kv_pair_log_event to reflect new parameters. |
| components/core/tests/test-ir_encoding_methods.cpp | - Enhanced serialization and deserialization functions to handle log events with timestamps and UTC offsets. - Expanded test cases for various scenarios, including protocol version validation and edge cases in timestamp serialization. |
| components/core/src/clp/ffi/SchemaTree.hpp | - Added equality operators for Node and SchemaTree classes. |
| components/core/tests/test-ffi_IrUnitHandlerInterface.cpp | - Updated KeyValuePairLogEvent::create method call to accept two schema trees. - Adjusted assertions to reflect changes in expected data structure. |
Possibly related PRs
-
#540: The changes in the main PR regarding the
KeyValuePairLogEventclass and its methods for handling key-value pairs are related to the updates in theIrUnitHandlerInterface, which also deals with handling log events, indicating a connection in functionality. - #573: The updates to the IR stream protocol version handling in this PR are relevant as they involve the deserialization of key-value pair log events, which is a core aspect of the main PR's changes.
-
#581: The refactoring of the
get_schema_subtree_bitmapmethod in theKeyValuePairLogEventclass directly relates to the changes made in this PR, which also focuses on enhancing the functionality of theKeyValuePairLogEventclass. - #557: The introduction of support for serializing and deserializing schema tree node IDs is relevant as it aligns with the main PR's focus on improving the handling of key-value pairs and their associated schema trees.
- #599: The refactoring and modernization of the streaming compression interface may indirectly relate to the overall improvements in the handling of data structures, including key-value pairs, as part of the broader codebase enhancements.
Suggested reviewers
- kirkrodrigues
- gibber9809
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-
I pushed a fix in commit <commit_id>, please review it. -
Generate unit testing code for this file. -
Open a follow-up GitHub issue for this discussion.
-
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:-
@coderabbitai generate unit testing code for this file. -
@coderabbitai modularize this function.
-
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:-
@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase. -
@coderabbitai read src/utils.ts and generate unit testing code. -
@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format. -
@coderabbitai help me debug CodeRabbit configuration file.
-
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
-
@coderabbitai pauseto pause the reviews on a PR. -
@coderabbitai resumeto resume the paused reviews. -
@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository. -
@coderabbitai full reviewto do a full review from scratch and review all the files again. -
@coderabbitai summaryto regenerate the summary of the PR. -
@coderabbitai generate docstringsto generate docstrings for this PR. (Experiment) -
@coderabbitai resolveresolve all the CodeRabbit review comments. -
@coderabbitai configurationto show the current CodeRabbit configuration for the repository. -
@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.
Can you resolve the conflicts?