clp icon indicating copy to clipboard operation
clp copied to clipboard

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface.

Open haiqi96 opened this issue 1 year ago • 1 comments

Description

The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:

  1. Refactored BufferedFileReader to use RAII model.
  2. Remove the internal FileDescriptor and C-style read/write from BufferedFileReader
  3. Update the instantiation of all BufferedFileReader instance to use a FileDescriptorReader (instead of FileReader) to avoid double buffering.
  4. Rewrite some BufferedFileReader apis using modern C++ features.

Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.

Validation performed

1.Made sure exisitng Unit test passes. 2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos. 3. Compressed and decompressed a single file, verified the decompressed file matches original

Summary by CodeRabbit

  • New Features

    • Enhanced file reading logic in BufferedFileReader by transitioning to a ReaderInterface.
    • Improved flexibility in file handling through integration with FileDescriptorReader.
    • Added a new method in FileCompressor for parsing and encoding content from a reader.
  • Improvements

    • Streamlined error handling and buffer management in BufferedFileReader.
    • Clarified behavior in BufferReader methods to handle edge cases more effectively.
    • Updated try_compressing_as_archive method to allow flexible file reading during compression.
  • Tests

    • Updated test cases for BufferedFileReader to utilize ReaderInterface for improved file handling.

haiqi96 avatar Aug 22 '24 01:08 haiqi96

Walkthrough

Replaces file-descriptor-based buffering with a ReaderInterface-backed buffered layer: adds clp::BufferedReader and clp::FileDescriptorReader, updates BufferReader validation/behaviour, migrates FileCompressor APIs to accept/route through ReaderInterface via parse_and_encode, renames build/test artifacts, and updates tests to exercise the new BufferedReader flow.

Changes

Cohort / File(s) Summary
New buffered reader (ReaderInterface-backed)
components/core/src/clp/BufferedReader.hpp, components/core/src/clp/BufferedReader.cpp
Adds clp::BufferedReader that wraps a ReaderInterface with internal buffer, checkpointing, seek/read/to-delimiter APIs, position tracking, ErrorCode semantics, constructor validation and buffer management helpers.
BufferedFileReader → ReaderInterface refactor
components/core/src/clp/BufferedFileReader.hpp, components/core/src/clp/BufferedFileReader.cpp
Reworks BufferedFileReader to hold std::shared_ptr<ReaderInterface> and use BufferReader/std::span-based reads; removes file-open/close/path APIs and public destructor, renames internal position helpers, and routes reads/seeks through the ReaderInterface.
BufferReader adjustments
components/core/src/clp/BufferReader.hpp, components/core/src/clp/BufferReader.cpp
Adds default constructor; tightens constructor validation (null/data_size/pos checks); uses initializer list; peek_buffer now sets buf to nullptr when empty.
FileCompressor API & flow
components/core/src/clp/clp/FileCompressor.hpp, components/core/src/clp/clp/FileCompressor.cpp
Adds parse_and_encode(..., ReaderInterface&, use_heuristic) and updates try_compressing_as_archive(...) to accept a ReaderInterface&; consolidates heuristic/library encoding paths and routes reading via provided readers.
New FileDescriptorReader usage & tests
components/core/tests/test-BufferedReader.cpp
Tests migrated from BufferedFileReader to BufferedReader backed by FileDescriptorReader; updates constructors, constants (cMinBufferSize), checkpoint/delimiter tests and test generation; removes explicit close calls.
Build updates (rename sources and tests)
components/core/CMakeLists.txt, components/core/src/clp/clp/CMakeLists.txt
Replaces build/test source references from BufferedFileReader* to BufferedReader* and updates test filename entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant BufferedReader
  participant UnderlyingReader as ReaderInterface

  Client->>BufferedReader: try_read(buf, n)
  alt buffer has data
    BufferedReader-->>Client: Success + bytes
  else buffer empty
    BufferedReader->>UnderlyingReader: try_read(span)
    alt bytes_read > 0
      BufferedReader->>BufferedReader: refill internal buffer
      BufferedReader-->>Client: Success + bytes
    else EndOfFile
      BufferedReader-->>Client: EndOfFile
    end
  end

  Client->>BufferedReader: try_seek_from_begin(pos)
  alt pos allowed (checkpoint/forward)
    BufferedReader-->>Client: Success
  else before checkpoint lower bound
    BufferedReader-->>Client: Unsupported
  end
sequenceDiagram
  autonumber
  participant Caller
  participant FileCompressor
  participant Reader as ReaderInterface
  participant Archive as ArchiveWriter

  Caller->>FileCompressor: parse_and_encode(..., path, gid, Archive, Reader, use_heuristic)
  alt use_heuristic == true
    FileCompressor->>FileCompressor: parse_and_encode_with_heuristic(...)
  else
    FileCompressor->>FileCompressor: parse_and_encode_with_library(...)
  end
  FileCompressor-->>Caller: return

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly summarises the primary refactor by stating that BufferedFileReader now wraps a ReaderInterface rather than a FileReader. It also clearly communicates the accompanying rename to BufferedReader. This gives reviewers a precise understanding of the core changes without unnecessary detail.
✨ Finishing touches
  • [ ] 📝 Generate Docstrings
🧪 Generate unit tests
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Oct 08 '24 00:10 coderabbitai[bot]