You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[celsiustx/metaflow fork, `dsl` branch](https://github.com/celsiustx/metaflow/tree/dsl/metaflow/api): see [`metaflow/api`](metaflow/api)
4
+
5
+
-------
3
6
# Metaflow
4
7
5
8
Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
Copy file name to clipboardExpand all lines: metaflow/api/README.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,6 +43,7 @@ Metaflow uses a [test harness](https://docs.metaflow.org/internals-of-metaflow/t
43
43
Much of what the test harness provides can be achieved using `pytest` directly (especially [`parametrize`](https://docs.pytest.org/en/6.2.x/parametrize.html), without writing flows against a separate API. This branch demonstrates a few ways to use `pytest` to run end-to-end flow tests:
44
44
45
45
```python
46
+
# Example old-style FlowSpec
46
47
from metaflow import FlowSpec, step
47
48
48
49
classOldFlow(FlowSpec):
@@ -55,9 +56,10 @@ class OldFlow(FlowSpec):
55
56
self.b =2
56
57
57
58
58
-
from metaflow.api import Flow, step
59
+
# Example new-style FlowSpec
60
+
from metaflow.api import FlowSpec, step
59
61
60
-
classNewFlow(FlowSpec, metaclass=Flow):
62
+
classNewFlow(FlowSpec):
61
63
@step
62
64
defstart(self):
63
65
self.a =1
@@ -69,13 +71,13 @@ class NewFlow(FlowSpec, metaclass=Flow):
69
71
# Test old- and new-style flows
70
72
from metaflow.tests.utils import parametrize, run
71
73
72
-
@parametrize('flow', [ OldFlow, NewFlow ])
74
+
@parametrize('flow', [ OldFlow, NewFlow, ])
73
75
deftest_simple_foreach(flow):
74
-
data = run(flow)
75
-
assert (data.a, data.b) == (1, 2)
76
+
data = run(flow)
77
+
assert (data.a, data.b) == (1, 2)
76
78
```
77
79
78
-
Check out the tests under [`metaflow/tests`](../tests); some highlights:
80
+
There are many such (py)tests under [`metaflow/tests`](../tests); some highlights:
Definition of a simple `foreach`/`join` flow, in old and new styles, and a parameterized test case that runs each and verifies their data artifact outputs.
@@ -122,30 +124,29 @@ The `metaflow.api` package contains a prototype, alternate API for writing flows
122
124
-[x] Optionally receive `self.input` as an argument to `@foreach` step (rather than having to reference `self.input` at start of step; see [`NewJoinFlow1`](../tests/flows/joins.py))
123
125
-[x] Make `metaflow flow <file> …` default to single Flow in file
-[ ]Investigate using a `@flow` class-decorator instead of [`metaclass=Flow`](flow.py)
126
-
-[ ]Get Pylint to accept self.input references in Flows w/o FlowSpec explicitly specified
127
+
-[x]~~Investigate using a `@flow` class-decorator instead of [`metaclass=Flow`](flowspec.py)~~ Turns out attaching a `metaclass` to a base `FlowSpec` class and inheriting from that seems to work well for both old- and new-style APIs.
128
+
-[x]~~Get Pylint to accept self.input references in Flows w/o FlowSpec explicitly specified~~ This was essentially solved by using a `FlowSpec` inheritance structure that comes with the necessary metaclass.
127
129
128
130
## TODOs <aid="todo"></a>
129
131
In addition to the tasks listed above, some general correctness/completeness TODOs:
130
132
-[x] Test `split-and`+`join` combo (see [`test_joins.py`](../tests/test_joins.py))
131
133
-[x] Integrate new `pytest` tests in CI ([example GHA run](https://github.com/celsiustx/metaflow/runs/2616959407))
-[ ]Investigate better ways to infer `cls.__file__` on old- and new-style flows
136
+
-[x]~~Investigate better ways to infer `cls.__file__` on old- and new-style flows~~ (done via `FlowSpecMeta` metaclass, for both old- and new-style `FlowSpec`s)
135
137
-[ ] Testing: use fresh metaflow db in tempdirs for each case/suite
136
138
-[ ] Investigate restoring Python 2 in CI (it was removed to get CI passing; failure was a `SyntaxError` related to default kwargs in some new code in this package)
137
139
138
140
## Examples <aid="examples"></a>
139
141
140
142
### Basic Flow <aid="basic-flow"></a>
141
-
Here's an example diff, taken from [`test_api.py`](../tests/test_api.py) of an `OldFlow` (written against the existing `FlowSpec` API) and a `NewFlow` (using the new `metaflow.api`):
143
+
Here's an example diff, modeled after the ([`new_`](../tests/flows/new_linear_flow.py))[`linear_flow.py`](../tests/flows/linear_flow.py)test flows, of a simple linear flow written against the existing and new `FlowSpec` APIs:
142
144
143
145
```diff
144
146
-from metaflow import FlowSpec, step
145
-
+from metaflow.api import Flow, step
147
+
+from metaflow.api import FlowSpec, step
146
148
147
-
+class NewFlow(metaclass=Flow):
148
-
-class OldFlow(FlowSpec):
149
+
class LinearFlow(FlowSpec):
149
150
- def start(self):
150
151
- self.next(self.one)
151
152
@step
@@ -180,11 +181,10 @@ These flows are checked for correctness and concordance in [`test_foreach.py`](.
0 commit comments