Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralise Cylc CLI Functionality #69

Closed
oliver-sanders opened this issue Dec 9, 2019 · 20 comments
Closed

Centralise Cylc CLI Functionality #69

oliver-sanders opened this issue Dec 9, 2019 · 20 comments

Comments

@oliver-sanders
Copy link
Member

At present the cylc command lives in Cylc Flow.

As of cylc/cylc-flow#3413 it is possible for other Cylc components to add their own Cylc commands.

It would probably make more sense for the cylc command to live in a common module which the others could used in turn. This common module should also contain the cylc version command which should now provide the version of the Cylc "super package", and, on-request a list of installed components and their versions. Note the list of installed components could be obtained from "entry points" in some way, this would work for both pipy and conda installations.

This "common module" could be:

  • (yet) another repository / pypi project.
  • A git submodule.
  • Something else?
@hjoliver
Copy link
Member

hjoliver commented Dec 9, 2019

With entry points, another repository would be reasonable.

Git submodules (for "shared components"?) are interesting but I haven't worked with them enough (or at all) to have a good feel for whether or not we should use them for this or anything else. (Maybe others have?).

@kinow
Copy link
Member

kinow commented Dec 9, 2019

Interesting, I think I haven't seen this done in another Python project. Maybe @sgaist will have an opinion on this too, or maybe he's seen entry points used this way?

Note that when we started to move to setuptools/setup.py, namespaces, cylc version - if I recall correctly - was "too smart". It had an environment variable I think, and also tried to give extra information. This didn't work with Conda, virtual envs, etc.

So given the number of modules, and the combinations that people can have for their environments (pip, conda, maybe some will want RPM or DEB packages, or even pyinstaller/spack/pipx/poetry/etc), I would only check if this change wouldn't become an issue later.

Another thing that needs to be checked is circular dependency. I think it won't happen as your suggestion would move only cylc to that module. But it would be complicated if this new project had a dependency on cylc-flow, and if cylc-flow also had to have a dependency on this project.

Other than that, +1

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Dec 9, 2019

I think the new cylc version should be pretty straight forward:

#!psudocode

print('cylc', __version__)
for package in entry_points:
    version = package.__version__
    if is_git_repo(package.__file__):
        version += '-' + '$(git describe --abbrev=4 --always)'
    print(package.__name__, version)

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Dec 9, 2019

I think I haven't seen this done in another Python project

No me neither. I think most of the time you have one "core" package and lots of "plugins". With Cylc it's a bit different as we are installing different bits of a system on different machines so we can't rely on the "core" Cylc Flow package always being installed.

Note: we may well chop Cylc Flow down further to avoid having to install the entire package on remote hosts when all job hosts really need is cylc message etc.

@kinow
Copy link
Member

kinow commented Dec 10, 2019

Note: we may well chop Cylc Flow down further to avoid having to install the entire package on remote hosts when all job hosts really need is cylc message etc.

Ah, that sounds even more interesting. Maybe if you have time later to try a proof of concept of how it would work, then we can have a look and discuss more. I'd be happy to have some fun trying it out, and testing locally too 👍

@kinow
Copy link
Member

kinow commented Dec 10, 2019

(Another thing to note, add a post-requirement task to update our Conda recipes, and add this one to the Conda Forge too)

@oliver-sanders
Copy link
Member Author

We have some static resources which could do with being common as well e.g. the Cylc logo.

@sgaist
Copy link

sgaist commented Jan 12, 2020

@kinow Do you mean you are looking for a project that extends a "core" Click command group with further commands ?

@kinow
Copy link
Member

kinow commented Jan 13, 2020

@kinow Do you mean you are looking for a project that extends a "core" Click command group with further commands ?

Hi @sgaist , more a case where the core command is in a separate repository. The cases I know have the core command as part of the main utility/repository (e.g. pytest, `.

But it could be that I am simply being too dense for being more used/familiar to conda list or pip list and preferring to avoid extra layers. Also as I am experimenting more using Cylc with Docker, Singularity, and thinking in how it could be used in Cloud environments, Kubernetes, etc, I see less need to have cylc version with list of dependencies.

But if having cylc version is something that will be helpful to users/site admins, then I won't object, and think the central Cylc CLI would be necessary to avoid the dependency on cylc-flow...

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 13, 2020

OK some quick context:

  • Cylc users are mostly scientists not developers.
  • Users don't necessarily know what PIP, Conda or Docker are, our more technical users often move in different circles.
  • Cylc is often used at institutions who install it for their users so users don't actually know how Cylc was installed for them.
  • Institutions often don't provide users with tools like PIP, Conda or Docker (prefering to pre-build any environments users need).

Basically pip list just isn't an option for most users. Gotta remember we are not building a tool for developers, some of our users have very good technical skills and we try to provide for them, but many aren't familiar with the command line or even Linux.

Besides I think we need to centralise the CLI anyway:

  • To allow partial installation on remote hosts (don't need the whole of Cylc Flow installed on the supercomputer)
  • cylc-uiserver should be cylc uiserver
  • jupyterhub should be cylc hub

Listing registered packages via entry points should be easy.

@kinow
Copy link
Member

kinow commented Jan 13, 2020

I am aware that most our users now are scientists. Many of the changes we have made in the past months make things more complicated for scientists (multiple repositories, no more batteries included, need to use conda or pip to install, etc).

But I think we did these changes for a good reason. And the old cylc version was problematic to migrate due to it being too elaborate (we also had some environment variable that gave some trouble during refactoring).

So there is a counter argument for each of your arguments. I also don't think having a command in front of jupyterhub, or that cylc uiserver is necessary. But we make no progress discussing like this.

If you say that will be best for users, I will have no objection and won't try to block any of these changes 👍

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 13, 2020

Many of the changes we have made in the past months make things more complicated for scientists

Scientists won't see these changes, they will just use the installation they are given.

But even crossing off those arguments we are still left with:

  • The issue of running Cylc commands in the absence of a full Cylc installation. (e.g. cylc message is pretty much all we need installed on a supercomputer*)
  • The issue of users not having access or permissions for commands like pip.

@hjoliver
Copy link
Member

@oliver-sanders - another point @kinow made offline (he can correct me if I misrepresent him) is, thinking ahead to looser cloud-based deployments where J-hub might be separately deployed to one host, UI Server(s) to another(s), cylc-flow to others, or whatever, ... where the main CLI command resides is clear enough if it stays with cylc-flow, but otherwise (if it needs to know about all components) not so clear. E.g. if we don't end up modifying J-hub at all (and hopefully we won't) it may not even get installed via Cylc. So how is our cylc --version command going to know about it? so maybe (attrib. @kinow ) this isn't a problem we can really solve centrally.

The only sticking point for me is, it does seem unnecessary to have to install the full cylc-flow package just to get the main cylc command for cylc message on job hosts...

@oliver-sanders
Copy link
Member Author

otherwise (if it needs to know about all components) not so clear

Not really.

If we have installations in different containers that's no problem, we wouldn't expect the installation in one container to know about the installation in another.

Whenever you run a cylc command you are running a Cylc executable installed in some location. cylc --version could just dump out all cylc plugins installed in that location.

if we don't end up modifying J-hub at all (and hopefully we won't) ... So how is our cylc --version command going to know about it?

Easy, like this:

cylc-uiserver/setup.cfg:

[options.entry_points]
console_scripts=
    cylc-hub = jupyterhub.app:main
    cylc-uiserver = cylc.uiserver.main:main
cylc.module=
    uiserver = 1.0.0

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 21, 2020

This ticket seems to have generated a lot of discussion which confuses me.

So to quickly explain what I've proposed on the cylc --version front is very simple and would use the entry points mechanism which we already rely on. It should be as simple as this:

cylc-flow/setup.cfg

[options.entry_points]
console scripts=
     cylc-run = cylc.flow.scripts.run.main:main
     ...
module=
     flow = 8.0.0

cylc-uiserver/setup.cfg

[options.entry_points]
console_scripts=
    cylc-hub = jupyterhub.app:main
    cylc-uiserver = cylc.uiserver.main:main
cylc.module=
    uiserver = 1.0.0

cylc/common/__init__.py

def version():
    print(f'Cylc {__version__}'
    for module, version in get_entry_point('cylc.module).items():
        print(f'    {module}: {version}')

We don't actually need the cylc.module entry point (could use package name or other entry point[s]), I've used cylc.module for illustrative purposes.

This is fully independent of PIP, Conda, Docker, whatever, it simply lists the names and versions of and Cylc packages installed in the local Python environment.

Managing stacks, environments and containers is your own lookout, we can't help you there!

@kinow
Copy link
Member

kinow commented Jan 21, 2020

This is fully independent of PIP, Conda, Docker, whatever, it simply lists the names and versions of and Cylc packages installed in the local Python environment.

Thanks for the code example, that clarifies a bit (I thought you wanted to list the version of jupyterhub, colorama, etc).

For me, if a user reported an issue that I thought could involve a problem in the dependency list, I'd prefer to get a list of modules (example support in jupyterhub this kind of info too (though I expect users will understand that if using Conda/rpm/etc, they may need to translate that command to something else - and I think their users are sometimes researchers too)

We don't actually need the cylc.module entry point (could use package name or other entry point[s]), I've used cylc.module for illustrative purposes.

The current example duplicates the version info, but I can see your point, as otherwise we need to hardcode which modules we want to display in the version output.

Users can still install any dependency they want, either via pip/conda/rpm/etc or just something like cp -r cylc-flow ~/.python/.../site-packages/cylc-flow or changing PYTHONPATH. Meaning that in their local node, they may have certain versions used for tests, or installed by accident.

Also, any user installing cylc-uiserver, automatically has cylc-flow (as well as jupyterhub, but that doesn't mean cylc-hub should work correctly I think).

cylc --version on a node with cylc-uiserver may report 8.0a1. But I think the user should be able to run a 8.0a2 or later version too, on the same node, with a different virtual environment. Perhaps even on a remote box/container/cloud/etc.

In this case, his suite could be running 8.0a2. Going to the node with cylc-uiserver, on the initial virtual environment or OS installation, cylc --version would show something like:

    cylc-uiserver:0.2
    cylc-flow:8.0a1

Using the other virtual environment, or going to the box running the suite, he would see:

    cylc-flow:8.0a2

Sorry if this example is too contrived. I am just trying to explain why I dont' think cylc --version listing modules installed locally could be used by all users.

The other issue I have with this proposal is the moveof cylc command to a separate repository. Which I don't think is necessary. IMHO, cylc is a command line utility that I may have installed in some node, to run and orchestrate workflows/suites.

I don't expect to have anything in that command that could run JupyterHub, or any other utility. If I am writing a suite, I don't expect to be able to run cylc --version to get the version of UI Server and decide to do something (that'd be a bad idea I think, but users would have that information now). I would prefer to have cylc with one only job/responsibility.

This is my opinion, but I can see others would disagree and find convenience in having more subcommands. So I don't think this should be a blocker for this issue.

If you really think this will be useful for support, and won't become a maintenance issue in the long term, then all good for me.

FYI a couple of projects which iterate over entry points to list plugins:

👍 this was mentioned in cylc/cylc-flow#2989 and cylc/cylc-flow#2959. I'm +1 for that.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 21, 2020

Thanks for the code example, that clarifies a bit (I thought you wanted to list the version of jupyterhub, colorama, etc).

I think there has been a lot of confusion here. Sorry for not explaining better.

All I want to do is to print a list of Cylc entry points in one environment. If the user has multiple environments the command would not report this information, the user would not expect it to.

Users can still install any dependency they want, either via pip/conda/rpm/etc or just something like cp -r cylc-flow ~/.python/.../site-packages/cylc-flow or changing PYTHONPATH.

Exactly!

The only way to tell what the user has actually installed (which works for everything) is to iterate over modules or entry points. Running pip list won't work in the contrived cases you've described.

Sorry if this example is too contrived

It's not contrived, this is probably how our site installations will look, however, I can't see why it would be a problem:

$ . /path/to/uiserver-env/bin/activate
$ cylc version --long
Cylc 8.0a2
    Flow  -  8.0a1
    UI Server  -  1.0.0
$ . /path/to/flow-env/bin/activate
$ cylc version --long
Cylc 8.0a1
    Flow  -  8.0a2
    Xtriggers  -  1.0.1

We would expect different results!

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 21, 2020

The other issue I have with this proposal is the move of cylc command to a separate repository.

This is the main point of this issue. Up for debate, but we do need a solution to the problem:

We don't require a full Cylc Flow installation on job hosts, we only require certain functionality namely some cylc subcommands.

How do we make it so that we can have commands like:

  • cylc hub - from UI Server
  • cylc run - from Flow
  • cylc message - from Flow (remote only package)

If we don't centralise CLI functionality then what should we do instead?

  • We could write a different cylc implementation for the remote part of Flow?
  • We could copy the existing command twice?
  • We could do something clever with Twine and upload two packages from the same repo?

@oliver-sanders
Copy link
Member Author

This issue has become too convoluted for progress, the cylc version conversation is only a small part of a larger issue.

Superseeding this issue with #76 which poses questions but doesn't propose solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants