-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Introduce the notion of Runtime Processor Type #2258
Comments
Wondering if One nice thing about the bit-based value approach is, indeed, that |
After discussing in the scrum we decided to move forward with the following changes to what is documented above:
|
Sorry I missed the discussion on scrum today. Could we revisit this on the community dev meeting tomorrow. |
@lresende - would you mind posting some concerns or questions so we have an idea of the issue prior to the meeting? (I may be late if my prior meeting runs late.) Thank you. |
I view the following as the drivers to runtime association:
I believe that most, if not all deployments, will focus on one runtime. In case there are multiple runtime processors that support a given runtime, I would focus on solving the problem via #2136 and only installing the desirable runtime processor that is desirable. Note that this would also enable users to continue to use the existing catalogs, etc as they shouldn't be impacted by a different implementation of a "kfp" runtime processor.
Agree on the Component Catalog connector parts, they are associated with a given runtime and not necessarily different if there are multiple runtime processor implementations available for a given runtime. Having said that, I don't think we should support ever having more than one live implementation on a deployment.
Opening a pipeline and seeing I also don't like that people would need to change the code to introduce new runtimes (e.g. the proposed list already misses Tekton, Flyte, etc). This would be an issue if people have proprietary ones as well. I also think that our UI is probably not ready to support multiple runtimes of the same type. |
After discussion on community meeting, @kevin-bates explained in more detail the scenario that would require this functionality and we agreed on it. Thank you @kevin-bates |
Based on the discussions and the decision that we only include enum entries relative to runtime processors that have been implemented or we know are under development, here's the current incantation of the enum (along with a proposed help-string): @unique
class RuntimeProcessorType(Enum):
"""RuntimeProcessorType enumerates the set of platforms targeted by runtime processors.
Each runtime processor implementation (subclass of PipelineProcessor) will reflect one
of these values. Users implementing their own runtime processor that corresponds to a
type not listed in this enumeration are responsible for appropriately extending this
enumeration and reflecting that entry in the corresponding runtime schema in order to
fully integrate their processor with Elyra.
"""
LOCAL = 'Local'
KUBEFLOW_PIPELINES = 'Kubeflow Pipelines'
APACHE_AIRFLOW = 'Apache Airflow'
ARGO = 'Argo' |
Today we discussed persisting runtime type vs runtime processor information in the pipeline. We decided that eventually it would be best to only persist one or the other, but due to time restraints we will persist everything for now. We were a bit unsure about which value to persist, but are leaning towards only persisting persisting
persisting
|
This issue presents a proposal for introducing the notion of a Runtime Processor Type so that implementations of the same runtime can be more easily distinguished.
Definition:
The following uses the term runtime processor. This term refers to the platform or orchestration tool that is driving the execution of a given pipeline. Support for two common runtime processors is embedded in Elyra. These are Apache Airflow and Kubeflow, although others exist outside of Elyra they could be implemented using our BYO model.
Problem:
With the ability to bring your own runtimes and, as of #2241, bring your own component catalog connectors, it is important that we have the ability to specify that a given entity supports a type of runtime processor. Today, Elyra only defines runtime processor names. Although each instance of a
PipelineProcessor
has atype
property, that property value is actually the name of a runtime configuration schema and not actually a type of runtime processor.For example, Elyra ships with two runtime schema definitions -
airflow
andkfp
. Runtime configuration instances of these schemas can be created, but each schema equates to a specific implementation of aRuntimePipelineProcessor
(or RPP) instance (which is a subclass of the aforementionedPipelineProcessor
class). However, if someone wanted to bring their own implementation ofRuntimePipelineProcessor
that also used Kubeflow to drive the execution of the pipeline, there really isn't a way for that implementation to indicate it too is a Kubeflow-based processor similar to the processor namedkfp
.Likewise, Component Catalog Connectors (or CCCs) want the ability to state that the components served from their implementation support a particular processor type, like Kubeflow or Apache Airflow, irrespective of how many RPP implementations are registered.
As a result, we need to formally introduce the notion of runtime processor type.
Proposal:
The first issue to introducing a runtime processor type is its general conflict to the bring your own paradigm. How can we possibly enumerate types - which must be known at the time of development - when we don't know what runtime processor implementations we will have until run time? This can be solved by first defining the set of known processor types, irrespective of there being an implementation of that type. This must be clearly conveyed to users. Just because a specific processor type is defined does NOT imply that such an implementation is available or that one even exists.
A quick google search for runtime processor types yields this result. Here we find six task/orchestration tools including the three we have some knowledge about (Apache Airflow, Kubeflow, and Argo) with the others being Luigi, Prefect, and MLFlow.
We can add some of these as development-time processor types with the idea that anyone wishing to introduce a runtime processor implementation (i.e., BYO RPP) of an undefined (unlisted) type, must first introduce a pull request adding that type to the enumeration.
Implementation (high-level):
We'd like a central location that houses the various processor types. The type names should have string representations that can be referenced in schemas and the UI. Ideally, simple comparisons could be made without regard to case. It should be easy for users to introduce a new type prior to implementing their Pipeline Processor or Catalog Connector. Python's
Enum
construct seems like a nice fit. In particular, we should use the@unique
attribute to ensure values are unique.Using the types referenced in the google search result, we'd have an enum like the following:
There are a couple of items worth noting.
Local
as a type - which I think we want to do - even though it's not an official processor type in Elyra. In addition, the nice thing about using bits is thatLocal
having a value of1
would be the only odd value - which is somewhat fitting 😄, but also unnecessary.We could discuss this further, I don't really have an affinity to the integer values.
Enum
classes also have a built-in dictionary that uses the stringized value as a key and returns the enum instance. This dictionary is__members__
, so, using the definition above,RuntimeProessorType.__members__.get('Kubeflow')
will returnRuntimeProcessorType.Kubeflow
. We will certainly wrap__members__
access into aget_instance()
kind of method.The string-value is accessible via a built-in
name
property, soRuntimeProcessorType.Kubeflow.name
yields'Kubeflow'
. Likewise, there's a built-invalue
property whereRuntimeProcessorType.Kubeflow.value
yields2
. As a result, I think using anEnum
subclass would give us the flexibility and central location we (and our users) would need.Schemas of the
Runtimes
schemaspace will require a "type" property. This property will be a constant because each runtime schema is associated with exactly one processor type. This will require a migration, but that can be easily performed by introducingmetadata_class_name
values for each of our runtime schemas. When a given instance is located, the class implementation will check if there's aprocessor_type
field and, if not, inject that field, persist the update, the return from the load. (We should introduce aversion
field at this time as well.) This same migration approach is used in the Catalog Connector PR (#2241)Users implementing their own Catalog Connectors should explicitly list the processor types they support. We could introduce the notion of a wildcard value (e.g.,
'*'
) to indicate any processor type is supported, but, given the potential plethora of task/orchestration engines, I think it would be best to be explicit. When those providers add support for another engine, they simply expose an updated schema whose enum-valued property contains the new reference. Likewise, they may choose to drop support for a given processor type.There are locations within the server and UI where the processor name is used today. These will need to be updated and replaced with formal type-based names. In addition, we'll want a new endpoint that can be used to retrieve the types for any registered runtime processors. For example, if there are two Kubeflow processor implementations registered (and nothing more), this endpoint would return
Kubeflow
(corresponding to thename
property ofRuntimeProcessorType.Kubeflow
) and not the names of the two registered processors (i.e., schemas).Alternative approach:
Rather than introduce a
RuntimeProcessorType
enumeration, we could introduce type-based base classes that derive fromRuntimePipelineProcessor
. For example, anApacheAirflowBase
could be inserted betweenRuntimePipelineProcessor
andAirflowPipelineProcessor
. This would allow for Airflow-specific functionality that is agnostic to the actual implementations to be located. These base classes would then expose aprocessor_type
property that reflects their type. In addition, code could also useisinstance(implementation_instance, ApacheAirflowBase)
to determine "type".The problem with this is that we'd still want to introduce "empty" implementations for future-supported types, even though one may never exist. This seems a little heavyweight.
Another caveat is that the "types" are scattered about and not in a central location like that of an
Enum
class. As a result, references to multiple types would require imports for the various implementations - which is way too heavyweight.Front-end/pipeline changes related to this proposal:
runtime_type
field from the schema within theRuntimes
schemaspace. Currently, this determination is based on the name of the schema (kfp
,airflow
). If theruntime_type
specifiesKUBEFLOW_PIPELINES
the Kubeflow icon is displayed, etc. The schematitle
should be used as the icon 'name' or hover information. Thename
may also serve that purpose.runtime:
property (which equates to the schema name, as is the case today) and aruntime_type:
(sibling) property which reflects the schema'sruntime_type
value.It is confusing what the second (and embedded)
runtime
value is used for or why it is located in a sub-object. We should probably reformat this as:barring reasons for the sub-object. Do other items get placed in the
"properties":
sub-object?Also, note the use of the enumerated type name field rather than the displayable value. Wherever items are persisted (like in the schema
runtime_type
field as well) we want to use the name so that a level of indirection is introduced for obtaining the values. This enables the values to change whenever necessary.4. Migration will need to infer the
runtime_type
value from the schema name - which should be one-to-one.5. I think we will want a uihint that can convert the type name to its value. For example, consider this portion of the catalog connector schema...
it would be nice if the editor could look up the up-cased values in a value map kind of thing to use in the dropdown, and, similarly, set the up-cased value into the field once selected. I.e., only used the "displayable" value for display. We can get by with just the up-cased values, but its a better UX if we can display the displayable values.
The text was updated successfully, but these errors were encountered: