-
Notifications
You must be signed in to change notification settings - Fork 935
New flow CLI, SDK #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
New flow CLI, SDK #612
Conversation
8d3155e to
1c14fa3
Compare
|
I can't figure out what failed in "R tests on macos-latest"; it just says cd R/tests
Rscript run_tests.R |
00873e5 to
db0a601
Compare
|
Some updates here:
Celsius has been using this in production (mostly in Batch mode) since August 2021, and some roadblocks to upstreaming have been lifted since then (notably, it looks like Python2 support is on the way out, which I never attempted to support here. I also didn't support R initially, but do now). I've rebased this a few times since opening it, and generally the only issue has been adding the new "entrypoint" ( It would be nice to check in about interest in upstreaming any of this! One simple change I'm considering here is renaming the new package I introduce from |
|
Hey folks! Is there a path on when (whether?) this PR will be merged? I love the graph composition and the new decorator-enabled syntax. However, I'm not technically apt to contribute, though I could lend help in testing if needed. |
previously this was mimicked by passing an explicit `args` tuple, which would result in SystemExit being caught
This is a large change, factored into 4 sequential groups of commits, addressing pain points I hit as a new user/developer of Metaflow; work leading up to this has been discussed in Slack:
metaflow flow <file[:flow_name]>)self.nextcallspytesttests undermetaflow/tests(in addition to existing tests undertest/core)Everything should be backwards-compatible; flows can optionally be defined using a new API under
metaflow.api; existing Metaflow public members work as they always have.Examples
See
metaflow/api/README.md@dslfor a detailed write-up and examples, but here is a short overview:Support multiple flows per file
Using new (optional)
metaflow flowCLI entrypoint:The flow name (
[:MyFlow]above) can be omitted if there is just one flow in the file.Reduce flow-definition boilerplate
startandendsteps,self.nextcalls, and__main__handler are often superfluous:Define graph structure using decorators, not
self.nextcallsMixing graph structure logic (
self.nextcalls) into step implementations is clumsy; they are two different levels of abstraction and should be better delineated/separated:(this resolves #604, I believe)
Compose flows via inheritance
Flows are modeled as Python classes, but currently there's no way to compose them (via inheritance, or otherwise). This PR implements a simple inheritance-based composition scheme:
(resolves #245)
Other methods of composing flows may be desirable, but this solves many use cases. Celsius has been using this in production since August 2021.
PR Status
d0is a merge of all of them, and the 4 "checkpoints" described below start from it as a base.master@ 53ab9d4)PR Structure
Since there are many commits, I've broken out 4 "checkpoints"; each one basically stands alone, and should be able to be reviewed/merged in sequence:
d0...d1(minor): nits, mostly related to graph constructiond1...d2(major): new CLI (metaflow flow <file>[:<flow>] …), multiple flow definitions per file, multiple FlowSpecs can be run within one Python process, morepytesttestsd2...d3(minor): morepytesttestsd3...d4(major): new FlowSpec SDK: graph structure in decorators, hideself.nextcalls, support flow composition via inheritanceLet me know if a different decomposition would make more sense, or if this isn't clear.
Invariants
I'll preserve these as updates are made to this PR:
d0is a merge of open/unrelated PRs, which the subsequent branches are rebased on top of. Currently:[test]pip extra #665d4is the same as thedslbranch that defines this PRd1,d2,d3,d4)d0...d1: misc nits, mostly related to graph constructionIntroduces an
IS_STEPconstant as a start to formalizing the ad hoc API around certain function attrs (e.g.is_step) indicating they are "steps". I build more on this later.This chunk could be folded into #666, or the subsequent
d1...d2chunk.d1...d2: support multiple flows per {file, process} + new CLI (metaflow flow <file>[:<flow>] …)This chunk of commits makes flows behave more like the Python classes they are modeled as:
if __name__ == '__main__': …handler no longer requiredNew CLI:
metaflow flow <file>[:<flow>] …To support multiple flow definitions in one file, a new, optional CLI entrypoint is implemented:
as a drop-in replacement for / alternative to:
If there is just one flow in a file, the
:<flow>portion can be omitted:Users can still use the old form (
python <file> …and a__name__ == '__main__'handler; all relevant changes are backwards-compatible), but Metaflow internally uses the new form everywhere.Expanded
Parameter/ "main flow" global bookkeepingIn addition to supporting multiple flow definitions per file, this change allows multiple flows to be invoked in one Python process.
This requires tracking a (global, mutable) "main" flow (and corresponding
Parameters that should be fed toclickparsing). See changes toparameters.py.d2...d3: morepytesttests3 commits, each adding a new test file:
test_simple_foreach.py: test a flow (defined in the same file) that squares some integers and then sums them.test_foreach.py: verify graph structure and data outputs from the02-statistics/stats.pytutorial flow.test_joins.py: test several kinds of branching/joining flowsd3...d4: new FlowSpec SDK: graph structure in decorators, hideself.nextcalls, flow composition via inheritanceThis introduces the
metaflow.apipackage, containing aFlowSpecbase-class and@stepdecorator (which are drop-in replacements formetaflow.{FlowSpec,step}) as well as new@foreachand@joindecorators.This alternate
FlowSpecclass and decorators allow constructing flows in a more ergonomic fashion:self.nextcalls in steps' function-bodies)self.nextcalls are still synthesized+injected under the hood, but abstracted from usersself.nextcalls).start/endsteps are optionalstart/endsteps are still synthesized under the hood, if not explicitly defined, for minimal disruption to lower layersFlow composition via inheritance
Addresses #245: flows can be "mixed-in" to factor out common sets of steps; see
metaflow/tests/test_inheritance.py/metaflow/tests/flows/inherited_flows.py.There are some rough edges:
start/endwould collide; otherwise this should Just Work)However, it allowed us (Celsius) to combine 5 production flows that were previously being shelled out to separately in sequence (also allowing
resumeto work across them, which previously didn't work), so hopefully it is a good start. We've been using this in production since August 2021.Longer term, I'd like to come up with syntax for directly "injecting" another flow into the body of a FlowSpec class (supporting direct composition instead of relying on the class-inheritance mechanism), and/or supporting
@stepdecorators on top-level Python functions (not just methods ofFlowSpecclasses).