clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface.
Description
The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:
- Refactored BufferedFileReader to use RAII model.
- Remove the internal FileDescriptor and C-style read/write from BufferedFileReader
- Update the instantiation of all BufferedFileReader instance to use a FileDescriptorReader (instead of FileReader) to avoid double buffering.
- Rewrite some BufferedFileReader apis using modern C++ features.
Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.
Validation performed
1.Made sure exisitng Unit test passes. 2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos. 3. Compressed and decompressed a single file, verified the decompressed file matches original
Summary by CodeRabbit
-
New Features
- Enhanced file reading logic in
BufferedFileReaderby transitioning to aReaderInterface. - Improved flexibility in file handling through integration with
FileDescriptorReader. - Added a new method in
FileCompressorfor parsing and encoding content from a reader.
- Enhanced file reading logic in
-
Improvements
- Streamlined error handling and buffer management in
BufferedFileReader. - Clarified behavior in
BufferReadermethods to handle edge cases more effectively. - Updated
try_compressing_as_archivemethod to allow flexible file reading during compression.
- Streamlined error handling and buffer management in
-
Tests
- Updated test cases for
BufferedFileReaderto utilizeReaderInterfacefor improved file handling.
- Updated test cases for
Walkthrough
Replaces file-descriptor-based buffering with a ReaderInterface-backed buffered layer: adds clp::BufferedReader and clp::FileDescriptorReader, updates BufferReader validation/behaviour, migrates FileCompressor APIs to accept/route through ReaderInterface via parse_and_encode, renames build/test artifacts, and updates tests to exercise the new BufferedReader flow.
Changes
| Cohort / File(s) | Summary |
|---|---|
New buffered reader (ReaderInterface-backed)components/core/src/clp/BufferedReader.hpp, components/core/src/clp/BufferedReader.cpp |
Adds clp::BufferedReader that wraps a ReaderInterface with internal buffer, checkpointing, seek/read/to-delimiter APIs, position tracking, ErrorCode semantics, constructor validation and buffer management helpers. |
BufferedFileReader → ReaderInterface refactorcomponents/core/src/clp/BufferedFileReader.hpp, components/core/src/clp/BufferedFileReader.cpp |
Reworks BufferedFileReader to hold std::shared_ptr<ReaderInterface> and use BufferReader/std::span-based reads; removes file-open/close/path APIs and public destructor, renames internal position helpers, and routes reads/seeks through the ReaderInterface. |
BufferReader adjustmentscomponents/core/src/clp/BufferReader.hpp, components/core/src/clp/BufferReader.cpp |
Adds default constructor; tightens constructor validation (null/data_size/pos checks); uses initializer list; peek_buffer now sets buf to nullptr when empty. |
FileCompressor API & flowcomponents/core/src/clp/clp/FileCompressor.hpp, components/core/src/clp/clp/FileCompressor.cpp |
Adds parse_and_encode(..., ReaderInterface&, use_heuristic) and updates try_compressing_as_archive(...) to accept a ReaderInterface&; consolidates heuristic/library encoding paths and routes reading via provided readers. |
New FileDescriptorReader usage & testscomponents/core/tests/test-BufferedReader.cpp |
Tests migrated from BufferedFileReader to BufferedReader backed by FileDescriptorReader; updates constructors, constants (cMinBufferSize), checkpoint/delimiter tests and test generation; removes explicit close calls. |
Build updates (rename sources and tests)components/core/CMakeLists.txt, components/core/src/clp/clp/CMakeLists.txt |
Replaces build/test source references from BufferedFileReader* to BufferedReader* and updates test filename entries. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant Client
participant BufferedReader
participant UnderlyingReader as ReaderInterface
Client->>BufferedReader: try_read(buf, n)
alt buffer has data
BufferedReader-->>Client: Success + bytes
else buffer empty
BufferedReader->>UnderlyingReader: try_read(span)
alt bytes_read > 0
BufferedReader->>BufferedReader: refill internal buffer
BufferedReader-->>Client: Success + bytes
else EndOfFile
BufferedReader-->>Client: EndOfFile
end
end
Client->>BufferedReader: try_seek_from_begin(pos)
alt pos allowed (checkpoint/forward)
BufferedReader-->>Client: Success
else before checkpoint lower bound
BufferedReader-->>Client: Unsupported
end
sequenceDiagram
autonumber
participant Caller
participant FileCompressor
participant Reader as ReaderInterface
participant Archive as ArchiveWriter
Caller->>FileCompressor: parse_and_encode(..., path, gid, Archive, Reader, use_heuristic)
alt use_heuristic == true
FileCompressor->>FileCompressor: parse_and_encode_with_heuristic(...)
else
FileCompressor->>FileCompressor: parse_and_encode_with_library(...)
end
FileCompressor-->>Caller: return
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Pre-merge checks and finishing touches
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
✅ Passed checks (2 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title Check | ✅ Passed | The title succinctly summarises the primary refactor by stating that BufferedFileReader now wraps a ReaderInterface rather than a FileReader. It also clearly communicates the accompanying rename to BufferedReader. This gives reviewers a precise understanding of the core changes without unnecessary detail. |
✨ Finishing touches
- [ ] 📝 Generate Docstrings
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.