Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically modifying the .electron_object.executor from an imported electron does not always work as expected #1889

Open
Andrew-S-Rosen opened this issue Dec 14, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@Andrew-S-Rosen
Copy link
Contributor

Andrew-S-Rosen commented Dec 14, 2023

Environment

  • Covalent version: 0.233.1rc0
  • Python version: 3.10
  • Operating system: Linux

What is happening?

Modifying the .electron_object.executor field of an imported electron does not always carry over in select circumstances.

How can we reproduce the issue?

Alright, bear with me for a moment! This is a pretty niche issue report.

Take the following example code:

import covalent as ct

@ct.electron
def increment(x, y = 1):
    return x + y

@ct.electron
@ct.lattice
def subflow(x, op):
    vals = [x] * 3
    results = []
    for val in vals:
        result = op(val)
        results.append(result)
    return results

@ct.lattice
def workflow(x, op):
    return subflow(x, op)


increment.electron_object.executor = "local"

ct.dispatch(workflow)(1, increment)

This works perfectly. The increment job runs with the local executor rather than the Dask executor.

However, if you move the electron and sublattice into a Python package and import them, the executor is not updated appropriately:

import covalent as ct
from mypackage import increment, subflow

@ct.lattice
def workflow(x, op):
    return subflow(x, op)

increment.electron_object.executor = "local"

ct.dispatch(workflow)(1, increment)

Here, the increment job will run via the Dask executor. This was as minimal of an example as I could make to reproduce the issue (see below).

What should happen?

The modified .executor field should be retained.

Any suggestions?

The following alternative approach works fine:

import covalent as ct
from mypackage import increment, subflow

@ct.electron(executor="local")
def local_increment(*args, **kwargs):
    return increment(*args, **kwargs)

@ct.lattice
def workflow(x, op):
    return subflow(x, op)

ct.dispatch(workflow)(1, local_increment)

Note that simpler examples do work. For instance, the following example works fine:

import covalent as ct
from mypackage import increment

@ct.lattice
def workflow(x, op):
    return op(x)

increment.electron_object.executor = "local"

ct.dispatch(workflow)(1, increment)

Maybe this has something to do with the mutability of increment... could very well be me.

@Andrew-S-Rosen Andrew-S-Rosen added the bug Something isn't working label Dec 14, 2023
@Andrew-S-Rosen Andrew-S-Rosen changed the title Dynamically modifying the .electron_object.executor from an imported electron does not always work with sublattices Dynamically modifying the .electron_object.executor from an imported electron does not always work as expected Dec 14, 2023
Andrew-S-Rosen added a commit to Quantum-Accelerators/quacc that referenced this issue Dec 18, 2023
…g job attributes (e.g. executor) (#1359)

Continuation of #1322.

Challenges:
- Mainly Dask, which does not play nicely when `functools.partial()` is
applied to a `Delayed` object. See
dask/dask#10707. There is now a workaround.
- Also there is a minor issue with dynamic switching of Covalent
executors, but there is a workaround. See
AgnostiqHQ/covalent#1889

Remaining issue:

Need to be able to parallelize the following code on Dask.

```python
from dask.distributed import Client
from ase.build import bulk
from quacc.recipes.emt.slabs import bulk_to_slabs_flow
from functools import partial

client = Client()

atoms = bulk("Cu")
delayed = bulk_to_slabs_flow(atoms)
result = client.gather(client.compute(delayed))
```

Hint: To monitor, set `"logfile": ""`.

---------

Co-authored-by: deepsource-autofix[bot] <62050782+deepsource-autofix[bot]@users.noreply.github.com>
@santoshkumarradha
Copy link
Member

Thanks for this @Andrew-S-Rosen ideally it should be so that all kwargs of electrons are properties as well that can be set post creation. I will update the issue as a bug/feature-enhancement as well.

Cc: @kessler-frost / @jimmylism

@santoshkumarradha
Copy link
Member

CC: @cjao , any quick idea on why this would be the case for packages?

@wjcunningham7
Copy link
Member

@Andrew-S-Rosen I think this may be due to how the objects are imported: https://stackoverflow.com/a/3536638

@cjao
Copy link
Contributor

cjao commented Feb 7, 2024

@Andrew-S-Rosen This is a fun question and boils down to how cloudpickle (Covalent's serialization mechanism) works. The key difference between your examples

(1) non-working

import covalent as ct
from mypackage import increment, subflow

@ct.lattice
def workflow(x, op):
    return subflow(x, op)

increment.electron_object.executor = "local"

ct.dispatch(workflow)(1, increment)

(2) working

import covalent as ct
from mypackage import increment

@ct.lattice
def workflow(x, op):
    return op(x)

increment.electron_object.executor = "local"

ct.dispatch(workflow)(1, increment)

is that (1) pickles the wrapped increment function -- as a parameter of the sub_workflow electron -- whereas in (2) only the underlying function is serialized, not the electron_object attribute. Because increment lives in an external module instead of __main__, Cloudpickle will serialize it by reference, not by value. Thus when the increment electron is unpickled in the executor that builds out sub_workflow, its attributes -- such as electron_object -- are whatever they are defined to be in mypackage.py.

Example:

mypackage.py

class Electron:
    def __init__(self):
        self.executor = "dask"

def my_func():
    pass

my_func.electron_object = Electron()

main.py

import cloudpickle
import mypackage

mypackage.my_func.electron_object.executor = "local"

with open("my_func.pkl", "wb") as f:
    cloudpickle.dump(mypackage.my_func, f)

exec.py

import pickle

with open("my_func.pkl", "rb") as f:
    my_func = pickle.load(f)
    print(my_func.electron_object.executor)
casey@AgnostiqHQ tests$ python main.py  && python exec.py
dask

But now if we instruct cloudpickle to serialize the mypackage module by value:

main.py

import cloudpickle
import mypackage

mypackage.my_func.electron_object.executor = "local"

cloudpickle.register_pickle_by_value(mypackage)

with open("my_func.pkl", "wb") as f:
    cloudpickle.dump(mypackage.my_func, f)

we get

casey@AgnostiqHQ tests$ python main.py  && python exec.py
local

@Andrew-S-Rosen
Copy link
Contributor Author

@wjcunningham7, @cjao: Thank you both for the very insightful replies!

Because increment lives in an external module instead of main, Cloudpickle will serialize it by reference, not by value.

That's the main takeaway for sure. I didn't realize that --- subtle! In any case, this isn't a major issue, but I did want to flag it as a potential "gotcha." With this response, I feel pretty comfortable knowing what's going on now. Feel free to close if you wish!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants