Skip to content

[pull] master from Netflix:master #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 663 commits into
base: master
Choose a base branch
from
Open

Conversation

pull[bot]
Copy link

@pull pull bot commented May 4, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@okteto-cloud
Copy link

okteto-cloud bot commented May 4, 2023

Your preview environment savingoyal-metaflow-pr-106 failed to deploy.

@pull pull bot added the ⤵️ pull label May 4, 2023
* Setup a new card and start the watcher
*/
function handleLoadCard(name, value) {
document.getElementsByTagName("title").innerHTML = name;

Check warning

Code scanning / CodeQL

DOM text reinterpreted as HTML Medium

DOM text
is reinterpreted as HTML without escaping meta-characters.
function handleLoadCard(name, value) {
document.getElementsByTagName("title").innerHTML = name;
const iframe = document.getElementById("card-frame");
iframe.src = `/card/${value}?embed=true`;

Check warning

Code scanning / CodeQL

DOM text reinterpreted as HTML Medium

DOM text
is reinterpreted as HTML without escaping meta-characters.
valayDave and others added 27 commits October 25, 2024 09:13
* Emit graph info to event logger, add runtime start metric

* Remove graph_info logging from resume

* Simplify run and resume metrics

* Remove logger update context

* Refactor monitor and logger to not use update_context

* Address comments
* use the correct path

* suggested changes
* better error message with dump

* only print on final failure
* Move DeployedFlow and Triggered run to a more inheritance model.

The _enrich_* methods were making it very hard to generate proper stubs.
This will hopefully improve that.

* Fix circular dependency and address comments

Some doc cleanup as well.

* Removed extraneous code

* Forgot file

* Update NBDeployer to forward to underlying deployer

* Fix circular deps and other crap

* Added stub-gen for deployer (try 2)

* Fix stub test

* Fix dataclients

* fix docstrings for deployer (#2119)

* Renamed metadata directory to metadata_provider (and related changes)

There was a clash between the `metadata` function and the `metadata` module
which caused issues with stubs (at least the new way of generating)

* Forgot one

* Typo

* Make nbdocs happy

* add docstrings for injected methods

---------

Co-authored-by: Romain Cledat <[email protected]>
* trigger_on_finish sorta works

* trigger deco works for event

* trigger events changes

* run black

* ok this one ran black

* black ran for real

* deleting some things i missed

* add format_deploytime_value() to both decos

* fixes error with types

* ran black

* fixes failing cases

* remove json.loads and add tuple to param type

* ran black

* function within parameter

* Delete local config file

* fixing borked cases

* refactor code

* shouldn't be changes to flowspec.py

* fixing bugs

* add deploy time trigger inits to argo parameter handling (#2146)

* run black

* undo modifications to to_pod

* reset util file

* pr comment

* remove print

---------

Co-authored-by: kayla seeley <[email protected]>
Co-authored-by: Sakari Ikonen <[email protected]>
* Add an async sigint handler

* Improve async runner interface

* Fix error handling in *handle_timeout

* Fix async_read_from_file_when_ready

* Simplify async_kill_processes_and_descendants

* Group kill_process_and_descendants into single command

* Call sigint manager only for live processes

* Guard against empty pid list in sigint handler

* Cleanup comments

* Reimplement CommandManager.kill in terms of kill_processes_and_descendants

* Use fifo for attribute file

* Fix `temporary_fifo` type annotation

* Fix `temporary_fifo` type annotation on py37 and py38

* Fix rewriting the current run metadata

* Wrap execs in `async_kill_processes_and_descendants` in try/except
* trigger_on_finish sorta works

* trigger deco works for event

* trigger events changes

* run black

* ok this one ran black

* black ran for real

* deleting some things i missed

* add format_deploytime_value() to both decos

* fixes error with types

* ran black

* fixes failing cases

* remove json.loads and add tuple to param type

* ran black

* function within parameter

* Delete local config file

* fixing borked cases

* refactor code

* shouldn't be changes to flowspec.py

* fixing bugs

* add deploy time trigger inits to argo parameter handling (#2146)

* run black

* undo modifications to to_pod

* reset util file

* pr comment

* remove print

* remove bad append

* Fix issue with to_pod and deploy time fields

---------

Co-authored-by: kayla seeley <[email protected]>
Co-authored-by: KaylaSeeley <[email protected]>
Co-authored-by: KaylaSeeley <[email protected]>
Co-authored-by: Sakari Ikonen <[email protected]>
Co-authored-by: Romain Cledat <[email protected]>
madhur-ob and others added 30 commits June 10, 2025 10:55
s3 based tests on minio will require the label `ok-to-test` or
`approved`
1. Added tests
2. Fixed invalid stub generation:

```
    def __init__(self, func: typing.Callable[[<class 'metaflow_extensions.nflx.plugins.datatools.dataframe.MetaflowDataFrame'>[T_MDF_co], typing.Optional[<class 'metaflow_extensions.nflx.plugins.functions.core.function_parameters.FunctionParameters'>[T_FP_co]]], <class 'metaflow_extensions.nflx.plugins.datatools.dataframe.MetaflowDataFrame'>[T_MDF_co]], **kwargs: typing.Dict[str, typing.Any]):
```

to:

```
    def __init__(self, func: typing.Callable[[metaflow.plugins.datatools.dataframe.MetaflowDataFrame[T_MDF_co], typing.Optional[metaflow.mf_extensions.nflx.plugins.functions.core.function_parameters.FunctionParameters[T_FP_co]]], metaflow.plugins.datatools.dataframe.MetaflowDataFrame[T_MDF_co]], **kwargs: typing.Dict[str, typing.Any]):

```

---------

Co-authored-by: Nissan Pow <[email protected]>
- in some cases (if you called TWO commands like create and trigger)
with a Runner/Deployer on the same flow, the flow/step mutators would be
applied *twice* which could lead to weird errors (duplicate decorators)
  - using a None default for configs failed (#2449)
This commit adds support for specifying Kubernetes container image pull
secrets in the `kubernetes` step decorator.

As an example:

```python
@kubernetes(
    image='docker.io/some-private-repo/image',
    image_pull_secrets=['regcred']
)
@step
def some_step:
    ...
```

Example output from `kubectl describe pod <some-pod>` on the
`metaflow-dev` stack where `metaflow` was installed via `pip install -e
</path/to/repo/checkout>`:

```console
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  1m   default-scheduler  Successfully assigned default/t-f03abf3d-2rpgp-j72q8 to minikube
  Normal  Pulling    1m   kubelet            Pulling image "docker.io/some-private-repo/image"
  Normal  Pulled     1m   kubelet            Successfully pulled image "docker.io/some-private-repo/image" in 669ms (670ms including waiting)
  Normal  Created    1m   kubelet            Created container some-step
  Normal  Started    1m   kubelet            Started container some-step
```

Resolves #1261
Earlier the `save_card` function ended up saving cards at two locations.
Thereby doubling the number of cards. (this was a temporary fix; which
we never ended up getting rolled back)

> Why did we do this ? :

When cards were introduced there was an assumption made about task-ids
being unique. This assumption was incorrect. Only the pathspec is unique
and there are no such guarantees about task-ids. When task-ids are
non-unique, card read would result in finding incorrect cards (because
cards were stored based on task-ids). If we had immediately switched
from storing cards based on `task-ids` to a `step-name/task-id`, then
card reading would have crashed for many users. It would especially
happen for users who are accessing cards created by a newer MF client
from an older version of MF client. It will also easily end up breaking
the metaflow-ui (which maybe using a client from an older version). But
since now its been over 3 years and there have been multiple newer
versions of clients rolled out, we can safely remove this since most
cards will be created by clients in the last 3 years. It keep the
listing/reading path intact. But only removes the dual write part.
adds a top-level option to toggle force-rebuilding the environment

```
python HelloConda.py --environment conda --force-rebuild-environment run
```

the introduction of the new option is not mandatory. we could simply
proceed with an internal envvar to enable the functionality for now.
When injecting commands to the runner, the stubs generated were not
accurate or complete. This addresses this issue
- cards can now have metadata about important information created for
them like ids/types/ other info in decorators
- This also allows us for book-keeping the attempts as cards are
created.
- We write the metadata in the cli so we put something stateful on top
to ensure we don't end up writing the metadata multiple times.
- Overall, since cards are being written async, we can only ensure best
effort registration from metaflow side since we don't necessarily block
when we save cards.
- This commit implementation will save metadata on a per-card basis.
Instead of updating all card metadata at once. We do this because we
will otherwise have multiple copies of the metadata in the database.
- The metadata writing is optional and can be disabled via the config
variable.
- This addition also allows us to update the metaflow-service code for
filtering cards based on attempt basis.

---------

Co-authored-by: Sakari Ikonen <[email protected]>
There is an error when trying to retrieve the IMDSv2 token because the
header value is `int`. This causes all the requests to fall back to
IMDSv1.
Reverts #2479

---------

Co-authored-by: Valay Dave <[email protected]>
implementing support for running custom functions as exit hooks with
argo workflows, local, kubernetes and batch runtimes.

also refactors current lifecycle hook setup to use the same primitives.

Suggested UX for defining custom exit hooks in the flow file:

```python
from time import sleep
from typing import Optional
from metaflow import step, FlowSpec, Parameter, exit_hook, Run


def success_print():
    print("Flow completed successfully. Performing some success related notifications.")


def failure_print(run: Optional[Run]):
    print("Flow failed.")

    if run is not None:
        print("No run was registered.")
        return

    print("failed run id:", run.id)
    print(
        "Failed tasks:\n",
        "\n".join(
            [task.pathspec for step in run for task in step if not task.successful]
        ),
    )


@exit_hook(on_success=[success_print])
@exit_hook(on_error=[failure_print])
class NotifyFlow(FlowSpec):
    should_fail = Parameter(name="should_fail", default=False)

    @step
    def start(self):
        print("Starting 👋")
        print("Should fail?", self.should_fail)
        if self.should_fail:
            sleep(60)
            raise Exception("failing as expected")
        self.next(self.end)

    @step
    def end(self):
        print("Done! 🏁")


if __name__ == "__main__":
    NotifyFlow()
```

---------

Co-authored-by: Madhur Tandon <[email protected]>
Co-authored-by: madhur-ob <[email protected]>
introduces a `get_secrets` function that exposes the same capabilities
as the `@secrets` decorator, but returns them as a dictionary instead of
setting environment variables.

The use case is to make accessing secrets possible in user-code as well.

example:

```python
@step
def start(self):
    secrets_dict = get_secrets(sources=["secret_source"], role="IAM_role_for_accessing")
    
    for secret_spec, contents_dict in secrets_dict.items():
         pass # do something with the fetched secrets
```
Changes the secret fetching function to a more comfortable UX:

```python
from metaflow.plugins.secrets import get_secret

secret_dict = get_secret(source="some-secret-source")
```
increases resource request for argo exit hook pods
reapplying the new packaging changes for releasing.

Co-authored-by: Romain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.