Skip to content

Commit 72d868d

Browse files
committed
add top-level README note
1 parent 20bb055 commit 72d868d

File tree

2 files changed

+19
-16
lines changed

2 files changed

+19
-16
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
![Metaflow_Logo_Horizontal_FullColor_Ribbon_Dark_RGB](https://user-images.githubusercontent.com/763451/89453116-96a57e00-d713-11ea-9fa6-82b29d4d6eff.png)
22

3+
[celsiustx/metaflow fork, `dsl` branch](https://github.com/celsiustx/metaflow/tree/dsl/metaflow/api): see [`metaflow/api`](metaflow/api)
4+
5+
-------
36
# Metaflow
47

58
Metaflow is a human-friendly Python/R library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

metaflow/api/README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Metaflow uses a [test harness](https://docs.metaflow.org/internals-of-metaflow/t
4343
Much of what the test harness provides can be achieved using `pytest` directly (especially [`parametrize`](https://docs.pytest.org/en/6.2.x/parametrize.html), without writing flows against a separate API. This branch demonstrates a few ways to use `pytest` to run end-to-end flow tests:
4444

4545
```python
46+
# Example old-style FlowSpec
4647
from metaflow import FlowSpec, step
4748

4849
class OldFlow(FlowSpec):
@@ -55,9 +56,10 @@ class OldFlow(FlowSpec):
5556
self.b = 2
5657

5758

58-
from metaflow.api import Flow, step
59+
# Example new-style FlowSpec
60+
from metaflow.api import FlowSpec, step
5961

60-
class NewFlow(FlowSpec, metaclass=Flow):
62+
class NewFlow(FlowSpec):
6163
@step
6264
def start(self):
6365
self.a = 1
@@ -69,13 +71,13 @@ class NewFlow(FlowSpec, metaclass=Flow):
6971
# Test old- and new-style flows
7072
from metaflow.tests.utils import parametrize, run
7173

72-
@parametrize('flow', [ OldFlow, NewFlow ])
74+
@parametrize('flow', [ OldFlow, NewFlow, ])
7375
def test_simple_foreach(flow):
74-
data = run(flow)
75-
assert (data.a, data.b) == (1, 2)
76+
data = run(flow)
77+
assert (data.a, data.b) == (1, 2)
7678
```
7779

78-
Check out the tests under [`metaflow/tests`](../tests); some highlights:
80+
There are many such (py)tests under [`metaflow/tests`](../tests); some highlights:
7981

8082
#### [`test_simple_foreach.py`](../tests/test_simple_foreach.py)
8183
Definition of a simple `foreach`/`join` flow, in old and new styles, and a parameterized test case that runs each and verifies their data artifact outputs.
@@ -122,30 +124,29 @@ The `metaflow.api` package contains a prototype, alternate API for writing flows
122124
- [x] Optionally receive `self.input` as an argument to `@foreach` step (rather than having to reference `self.input` at start of step; see [`NewJoinFlow1`](../tests/flows/joins.py))
123125
- [x] Make `metaflow flow <file> …` default to single Flow in file
124126
- [x] Unittest `metaflow flow <file>:<flow> …` invocation style
125-
- [ ] Investigate using a `@flow` class-decorator instead of [`metaclass=Flow`](flow.py)
126-
- [ ] Get Pylint to accept self.input references in Flows w/o FlowSpec explicitly specified
127+
- [x] ~~Investigate using a `@flow` class-decorator instead of [`metaclass=Flow`](flowspec.py)~~ Turns out attaching a `metaclass` to a base `FlowSpec` class and inheriting from that seems to work well for both old- and new-style APIs.
128+
- [x] ~~Get Pylint to accept self.input references in Flows w/o FlowSpec explicitly specified~~ This was essentially solved by using a `FlowSpec` inheritance structure that comes with the necessary metaclass.
127129

128130
## TODOs <a id="todo"></a>
129131
In addition to the tasks listed above, some general correctness/completeness TODOs:
130132
- [x] Test `split-and`+`join` combo (see [`test_joins.py`](../tests/test_joins.py))
131133
- [x] Integrate new `pytest` tests in CI ([example GHA run](https://github.com/celsiustx/metaflow/runs/2616959407))
132134
- [ ] Implement "conditional" decorators (`@iff`, `@ifn`)
133135
- [ ] Check overloaded Flow basenames / using FQNs
134-
- [ ] Investigate better ways to infer `cls.__file__` on old- and new-style flows
136+
- [x] ~~Investigate better ways to infer `cls.__file__` on old- and new-style flows~~ (done via `FlowSpecMeta` metaclass, for both old- and new-style `FlowSpec`s)
135137
- [ ] Testing: use fresh metaflow db in tempdirs for each case/suite
136138
- [ ] Investigate restoring Python 2 in CI (it was removed to get CI passing; failure was a `SyntaxError` related to default kwargs in some new code in this package)
137139

138140
## Examples <a id="examples"></a>
139141

140142
### Basic Flow <a id="basic-flow"></a>
141-
Here's an example diff, taken from [`test_api.py`](../tests/test_api.py) of an `OldFlow` (written against the existing `FlowSpec` API) and a `NewFlow` (using the new `metaflow.api`):
143+
Here's an example diff, modeled after the ([`new_`](../tests/flows/new_linear_flow.py))[`linear_flow.py`](../tests/flows/linear_flow.py) test flows, of a simple linear flow written against the existing and new `FlowSpec` APIs:
142144

143145
```diff
144146
-from metaflow import FlowSpec, step
145-
+from metaflow.api import Flow, step
147+
+from metaflow.api import FlowSpec, step
146148

147-
+class NewFlow(metaclass=Flow):
148-
-class OldFlow(FlowSpec):
149+
class LinearFlow(FlowSpec):
149150
- def start(self):
150151
- self.next(self.one)
151152
@step
@@ -180,11 +181,10 @@ These flows are checked for correctness and concordance in [`test_foreach.py`](.
180181
```diff
181182
-from metaflow import FlowSpec, step, IncludeFile
182183
+from metaflow import IncludeFile
183-
+from metaflow.api import Flow, step, foreach, join
184+
+from metaflow.api import FlowSpec, step, foreach, join
184185

185186

186-
-class MovieStatsFlow(FlowSpec):
187-
+class MovieStatsFlow(metaclass=Flow):
187+
class MovieStatsFlow(FlowSpec):
188188
movie_data = IncludeFile("movie_data",
189189
help="The path to a movie metadata file.",
190190
default=script_path('movies.csv'))

0 commit comments

Comments
 (0)