-
Notifications
You must be signed in to change notification settings - Fork 935
Open
Description
Problem Statement
Metaflow currently lacks native support for S3-based event sensors. In many real-world ML and data engineering workflows, triggering a flow in response to an S3 event (such as a new file upload) is a fundamental need.
Currently, users must stitch together external systems like:
AWS Lambda + EventBridge
Argo Events with custom S3 event sources
Manual cron jobs or polling loops
This adds infrastructure complexity, external dependencies, and friction in building reactive pipelines.
Proposed Solution
✅ Proposed Solution
Introduce a lightweight pattern for "sensor-like" polling flows that:
Can run on a cadence via @schedule (e.g., every 10 minutes).
Check a data source (e.g., S3, a database, etc.).
Emit an event or trigger another flow when conditions are met (e.g., a new file appears).
This doesn't require a full sensor abstraction, but can be encouraged via:
A documented pattern or utility base class like PollingSensorFlowSpec.
Best practices for storing state (last_seen_key) using artifacts or S3.
Built-in support for launching flows programmatically via metaflow.client.run() from within Metaflow itself.
Example
@schedule(minutes=10)
class S3PollingFlow(FlowSpec):
@step
def start(self):
# Check for new S3 files, compare with last seen state
# If found, trigger downstream flow
self.next(self.end)
@step
def end(self):
pass
This approach is conceptually similar to how sensors are implemented in other systems like Airflow (via polling).
I would love to contribute this feature 🎧
Metadata
Metadata
Assignees
Labels
No labels