platform icon indicating copy to clipboard operation
platform copied to clipboard

feat(sdk): refactor TDF architecture with streaming support and segment-based writing

Open strantalis opened this issue 5 months ago • 3 comments

Proposed Changes

This commit introduces a comprehensive refactoring of the TDF SDK architecture to support streaming TDF creation with out-of-order segment writing, designed specifically for S3 multipart upload integration.

Key Changes:

Architecture Refactoring:

  • Restructure TDF components into dedicated packages (tdf/, tdf/keysplit/)
  • Move core TDF types (manifest, assertions, writer) into tdf/ package
  • Implement new keysplit package with XOR-based key splitting algorithm
  • Add segment-based caching and locator system for streaming operations

New Streaming Writer Implementation:

  • Add StreamingWriter API with 0-based segment indexing
  • Implement SegmentWriter for out-of-order segment writing
  • Support dynamic segment expansion beyond initial expected count
  • Add comprehensive test coverage for sequential and out-of-order scenarios
  • Include memory cleanup functionality for uploaded segments

Archive Package Cleanup:

  • Remove legacy TDF3Writer and associated test files
  • Consolidate to SegmentWriter as the primary archive implementation
  • Add ZIP64 support for large TDF files
  • Implement deterministic segment output (segment 0 includes ZIP header)

Key Features:

  • Out-of-order segment writing with deterministic assembly
  • S3 multipart upload compatibility with part-to-segment mapping
  • Streaming TDF creation without full payload buffering
  • Comprehensive error handling and validation
  • Thread-safe operations with proper mutex protection

Test Coverage:

  • Fix streaming writer tests to use 0-based indexing consistently
  • Add validation tests for segment index boundaries
  • Include benchmarks for sequential vs out-of-order writing
  • Test context cancellation and error conditions

This refactoring enables efficient streaming TDF creation for large files while maintaining backward compatibility with existing TDF readers.

Checklist

  • [ ] I have added or updated unit tests
  • [ ] I have added or updated integration tests (if appropriate)
  • [ ] I have added or updated documentation

Testing Instructions

strantalis avatar Aug 25 '25 19:08 strantalis

/gemini review

strantalis avatar Aug 25 '25 19:08 strantalis

/gemini review

strantalis avatar Aug 27 '25 12:08 strantalis

/gemini review

elizabethhealy avatar Oct 02 '25 16:10 elizabethhealy