Skip to content

Conversation

@spkrka
Copy link
Member

@spkrka spkrka commented Nov 17, 2025

This PR adds internal bridge classes to enable experimental fluent API development for SMB operations. These helpers expose package-private methods needed for advanced SMB use cases without changing the public API.

Changes

New Files

  • SMBCollectionHelper.java (272 lines) - Bridge to access FileOperations and metadata extraction from SortedBucketIO.Read and TransformOutput
  • SmbValidationHelper.java (104 lines) - Bridge for source validation and metadata access

Modified Files

  • AvroFileOperations.java - Fixed codec serialization using transient CodecFactory with custom serialization to properly handle Avro codec across serialization boundaries
  • SortedBucketTransform.java - Made BucketSource package-visible and added getSplitPointsConsumed/Remaining overrides to prevent unnecessary split attempts
  • SortedBucketIO.java - Made KeyFn interface public to support custom key extraction

Rationale

These changes enable experimentation with SMB-aware libraries and tooling (similar to how SortedBucketIOUtil provides testing helpers). All new classes are marked @Internal and explicitly documented as subject to change without notice.

Impact

  • Non-breaking: Zero impact on existing public API
  • Minimal scope: Only ~400 lines, all marked @Internal
  • Compile tested: Builds successfully with no new errors
  • Enables experimentation: Allows building advanced SMB libraries externally

This follows the precedent of existing internal helpers in Scio and enables users to build experimental SMB tooling without requiring changes to the main codebase.

This commit adds internal bridge classes to enable experimental fluent API
development for SMB operations. These helpers expose package-private methods
needed for advanced SMB use cases without changing the public API.

Changes:
- Add SMBCollectionHelper: bridge to access FileOperations and metadata
  extraction from SortedBucketIO.Read and TransformOutput
- Add SmbValidationHelper: bridge for source validation and metadata access
- Fix AvroFileOperations codec serialization: use transient CodecFactory with
  custom serialization to properly handle Avro codec across serialization
- Make SortedBucketTransform.BucketSource package-visible to enable custom
  readers
- Add getSplitPointsConsumed/Remaining overrides to prevent unnecessary split
  attempts
- Make SortedBucketIO.KeyFn public to support custom key extraction

All new classes are marked @internal and subject to change without notice.
These changes are non-breaking and enable experimentation with SMB-aware
libraries and tooling.
@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (8e89281) to head (b956939).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5814   +/-   ##
=======================================
  Coverage   61.44%   61.45%           
=======================================
  Files         314      314           
  Lines       11429    11429           
  Branches      812      812           
=======================================
+ Hits         7023     7024    +1     
+ Misses       4406     4405    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant