Skip to content

Implement an S3 sensor #2472

@Atharva-Phatak

Description

@Atharva-Phatak

Problem Statement

Metaflow currently lacks native support for S3-based event sensors. In many real-world ML and data engineering workflows, triggering a flow in response to an S3 event (such as a new file upload) is a fundamental need.

Currently, users must stitch together external systems like:

AWS Lambda + EventBridge

Argo Events with custom S3 event sources

Manual cron jobs or polling loops

This adds infrastructure complexity, external dependencies, and friction in building reactive pipelines.

Proposed Solution

✅ Proposed Solution

Introduce a lightweight pattern for "sensor-like" polling flows that:

Can run on a cadence via @schedule (e.g., every 10 minutes).

Check a data source (e.g., S3, a database, etc.).

Emit an event or trigger another flow when conditions are met (e.g., a new file appears).

This doesn't require a full sensor abstraction, but can be encouraged via:

A documented pattern or utility base class like PollingSensorFlowSpec.

Best practices for storing state (last_seen_key) using artifacts or S3.

Built-in support for launching flows programmatically via metaflow.client.run() from within Metaflow itself.

Example

@schedule(minutes=10)
class S3PollingFlow(FlowSpec):
    @step
    def start(self):
        # Check for new S3 files, compare with last seen state
        # If found, trigger downstream flow
        self.next(self.end)

    @step
    def end(self):
        pass

This approach is conceptually similar to how sensors are implemented in other systems like Airflow (via polling).

I would love to contribute this feature 🎧

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions