clp-s: Add support for escaping characters in KQL key names
Description
This PR adds support for using backslash to escape characters in KQL key names. The change is mostly contained in the StringUtiles::tokenize_column_descriptor function, but does involve a small change to the kql.g4 grammar to allow the '.' character to be escaped.
For example the key consisting of the tokens {"com.bnn", "uuid"} can now be specified as "com\.bnn.uuid" in KQL.
This PR also adds error handling for invalid KQL key names. Invalid key names currently include keys ending ending in a trailing unescaped '' or '.'.
Validation performed
- Added unit tests for parsing keys with escaped characters
- Confirmed that sensible errors are shown on the command line when entering invalid KQL queries
Summary by CodeRabbit
Summary by CodeRabbit
-
New Features
- Enhanced error handling in JSON parsing and KQL expression parsing.
- Added support for escape sequences in column names within KQL.
- Updated KQL grammar to recognize periods as special characters.
-
Bug Fixes
- Improved robustness of tokenization methods to prevent failures from propagating.
-
Tests
- Introduced new test cases for validating the parsing of queries with escaped characters in column names.
- Added tests for handling illegal escape sequences in column names.
Walkthrough
The changes across multiple files enhance error handling and validation processes in the parsing and processing of JSON and KQL data. Key modifications include improved error logging and exception handling in the JsonParser and TimestampDictionaryReader. The tokenize_column_descriptor method's return type was changed to bool in Utils, allowing for better error feedback. Additionally, the KQL grammar was updated to recognize periods as special characters, and new tests were added to validate parsing with escape sequences. Overall, the changes focus on robustness and error management.
Changes
| File Path | Change Summary |
|---|---|
components/core/src/clp_s/JsonParser.cpp |
Enhanced error handling in constructor and parse method; added logging and exception handling. |
components/core/src/clp_s/TimestampDictionaryReader.cpp |
Added error handling in read_new_entries method for tokenization failures. |
components/core/src/clp_s/Utils.cpp |
Changed return type of tokenize_column_descriptor from void to bool and improved functionality. |
components/core/src/clp_s/Utils.hpp |
Updated tokenize_column_descriptor method signature to return bool with [[nodiscard]] attribute. |
components/core/src/clp_s/clp-s.cpp |
Improved error handling in search_archive function; checks tokenization result and logs errors. |
components/core/src/clp_s/search/kql/Kql.g4 |
Modified SPECIAL_CHARACTER fragment to include a period (.) as a valid character. |
components/core/src/clp_s/search/kql/kql.cpp |
Enhanced error handling in visitColumn and parse_kql_expression methods; added exception handling. |
components/core/tests/test-kql.cpp |
Added new test cases for handling escape sequences in column names in KQL parsing. |
Sequence Diagram(s)
sequenceDiagram
participant User
participant KQLParser
participant StringUtils
participant Logger
User->>KQLParser: Parse KQL Expression
KQLParser->>StringUtils: tokenize_column_descriptor(column)
alt Tokenization Success
StringUtils-->>KQLParser: Tokens
KQLParser->>User: Return Parsed Expression
else Tokenization Failure
StringUtils-->>KQLParser: false
KQLParser->>Logger: Log Error
KQLParser-->>User: Return nullptr
end
Possibly related PRs
- #573: The changes in this PR involve enhancing error handling and validation related to protocol version management, which aligns with the error handling improvements made in the main PR's
JsonParser.cpp, particularly in how errors are logged and managed during parsing processes.
Suggested reviewers
- kirkrodrigues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?
🪧 Tips
Chat
There are 3 ways to chat with CodeRabbit:
- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
-
I pushed a fix in commit <commit_id>, please review it. -
Generate unit testing code for this file. -
Open a follow-up GitHub issue for this discussion.
-
- Files and specific lines of code (under the "Files changed" tab): Tag
@coderabbitaiin a new review comment at the desired location with your query. Examples:-
@coderabbitai generate unit testing code for this file. -
@coderabbitai modularize this function.
-
- PR comments: Tag
@coderabbitaiin a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:-
@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase. -
@coderabbitai read src/utils.ts and generate unit testing code. -
@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format. -
@coderabbitai help me debug CodeRabbit configuration file.
-
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.
CodeRabbit Commands (Invoked using PR comments)
-
@coderabbitai pauseto pause the reviews on a PR. -
@coderabbitai resumeto resume the paused reviews. -
@coderabbitai reviewto trigger an incremental review. This is useful when automatic reviews are disabled for the repository. -
@coderabbitai full reviewto do a full review from scratch and review all the files again. -
@coderabbitai summaryto regenerate the summary of the PR. -
@coderabbitai resolveresolve all the CodeRabbit review comments. -
@coderabbitai configurationto show the current CodeRabbit configuration for the repository. -
@coderabbitai helpto get help.
Other keywords and placeholders
- Add
@coderabbitai ignoreanywhere in the PR description to prevent this PR from being reviewed. - Add
@coderabbitai summaryto generate the high-level summary at a specific location in the PR description. - Add
@coderabbitaianywhere in the PR title to generate the title automatically.
CodeRabbit Configuration File (.coderabbit.yaml)
- You can programmatically configure CodeRabbit by adding a
.coderabbit.yamlfile to the root of your repository. - Please see the configuration documentation for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation:
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
Documentation and Community
- Visit our Documentation for detailed information on how to use CodeRabbit.
- Join our Discord Community to get help, request features, and share feedback.
- Follow us on X/Twitter for updates and announcements.