clp icon indicating copy to clipboard operation
clp copied to clipboard

clp-s: Add the read path for single-file archive

Open wraymo opened this issue 1 year ago • 2 comments

Description

Validation performed

wraymo avatar Oct 24 '24 14:10 wraymo

Walkthrough

The changes in this pull request focus on enhancing the functionality of the ArchiveWriter and related classes, particularly in handling single-file archives. Key updates include the introduction of new member variables and methods, modifications to existing method signatures, and improvements to command line argument parsing. The TimestampDictionaryWriter class has also been restructured to streamline its operations. Additionally, a new file defining structures for single-file archives has been added, along with updates to related classes and methods to support these enhancements.

Changes

File Change Summary
components/core/src/clp_s/ArchiveWriter.cpp Added member variable m_single_file_archive, modified close method to differentiate between single and multi-file archives, added write_timestamp_dict method, updated store_tables return type to std::pair<size_t, size_t>.
components/core/src/clp_s/ArchiveWriter.hpp Added bool single_file_archive to ArchiveWriterOption, updated store_tables return type, added methods for single-file archive handling.
components/core/src/clp_s/CommandLineArguments.cpp Introduced single-file-archive option in command line argument parsing.
components/core/src/clp_s/CommandLineArguments.hpp Added member variable m_single_file_archive and getter method get_single_file_archive().
components/core/src/clp_s/JsonParser.cpp Added single_file_archive to m_archive_options structure in constructor.
components/core/src/clp_s/JsonParser.hpp Added bool single_file_archive to JsonParserOption struct.
components/core/src/clp_s/SingleFileArchiveDefs.hpp Introduced definitions and structures for managing single-file archives, including ArchiveHeader, ArchiveCompressionType, and related structures.
components/core/src/clp_s/TimestampDictionaryWriter.cpp Replaced write_and_flush_to_disk with write, added clear method, removed open and close methods.
components/core/src/clp_s/TimestampDictionaryWriter.hpp Updated constructor and method signatures, removed file management methods, added write and clear methods.
components/core/src/clp_s/archive_constants.hpp Added constant cTmpPostfix for temporary file postfix.
components/core/src/clp_s/clp-s.cpp Modified compress function to include single_file_archive parameter.
components/core/src/clp_s/TimestampEntry.hpp Updated method write_to_file to write_to_stream, changing parameter type from ZstdCompressor& to std::stringstream&.

Possibly related PRs

  • #466: The changes in ArchiveWriter.cpp and ArchiveWriter.hpp regarding the handling of archive formats and metadata are related to the overall archiving functionality, which may connect with the changes in ArchiveReader that also deal with metadata and schema reading.
  • #600: The modifications to the CommandLineArguments class, specifically the renaming of ordered_chunk_size to target_ordered_chunk_size, directly relate to the changes in the main PR that involve chunk size handling in the ArchiveWriter class. This indicates a cohesive approach to managing chunk sizes across different components.

Suggested reviewers

  • wraymo

📜 Recent review details

Configuration used: CodeRabbit UI Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7a50ffc69f054f8e4584ecffa0ce34be0ec88a and c6984876cfc63516470404945b35cab7526caf6d.

📒 Files selected for processing (1)
  • components/core/src/clp_s/Utils.hpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/core/src/clp_s/Utils.hpp

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot] avatar Oct 24 '24 14:10 coderabbitai[bot]

Nice work! Seems mostly good for a draft implementation.

Main things we should change quickly is putting the archive header + metadata section into the regular multi-file archive, and also formally pick a magic number + change the magic number to 4 bytes.

There are other things we need to clean up/think about before actually merging this, but the above should changes should be enough to build off of for prototyping.

Also need to go through and fix all of the fields that are a different size than what the spec specifies.

gibber9809 avatar Oct 27 '24 17:10 gibber9809