Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal Nexus: Deploy to Production - Temporal Cloud & Nexus #3078

Merged
merged 34 commits into from
Oct 2, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
1dbab3d
first add
jsundai Sep 11, 2024
f769fba
remove bidirectional
jsundai Sep 11, 2024
a43b149
adding audit logging
jsundai Sep 11, 2024
26ee811
updating intro to nexus
jsundai Sep 12, 2024
18d8b88
making small edits
jsundai Sep 12, 2024
3381a67
additional edits
jsundai Sep 12, 2024
4b38718
updating audit logging
jsundai Sep 12, 2024
54eae7e
adding metric references
jsundai Sep 12, 2024
84e204a
removing for build
jsundai Sep 12, 2024
4f8e7f7
changing subnav titles to match page titles
jsundai Sep 12, 2024
5fd9634
edits based on feedback and structural
jsundai Sep 15, 2024
a78c60f
toning down
jsundai Sep 15, 2024
6f18046
removing non-existing links and adding pricing link
jsundai Sep 15, 2024
0f46d76
audit logging link
jsundai Sep 15, 2024
f0e1b3c
additional edits stylistic
jsundai Sep 25, 2024
822503b
adding periods and removing command ticks
jsundai Sep 25, 2024
034ab21
edits and changing latin terms
jsundai Sep 25, 2024
73a05ce
adding in glossary reference for rpc and taking away more ticks
jsundai Sep 25, 2024
d8380cb
Merge branch 'main' into temporal-cloud-deploy-production
jsundai Sep 25, 2024
e165a05
edits
jsundai Sep 25, 2024
c5e1284
Merge branch 'temporal-cloud-deploy-production' of github.com:tempora…
jsundai Sep 25, 2024
c5f9ad8
edits
jsundai Sep 25, 2024
ecbd5f0
adding in encyclopedia links, api registry image, and small edits
jsundai Sep 26, 2024
1450674
created diagram and added to doc
jsundai Sep 26, 2024
35b33e9
polish diagram
jsundai Sep 27, 2024
083a6ed
small edits on getting started
jsundai Sep 30, 2024
a0a8b05
editing page
jsundai Oct 1, 2024
4176cc2
adding in more links
jsundai Oct 1, 2024
35dc74e
setting line
jsundai Oct 1, 2024
99f9b09
edits based on manus feedback
jsundai Oct 2, 2024
c433a01
quick grammar/spelling check
jsundai Oct 2, 2024
73632e6
Merge branch 'main' into temporal-cloud-deploy-production
jsundai Oct 2, 2024
6354233
address small nit
jsundai Oct 2, 2024
5046019
Merge branch 'temporal-cloud-deploy-production' of github.com:tempora…
jsundai Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion docs/production-deployment/cloud/audit-logging.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ tags:
- troubleshooting
---

Audit Logging is a feature of [Temporal Cloud](/cloud/overview) that provides forensic access information at the account level, the user level, and the [Namespace](/namespaces) level.
Audit Logging is a feature of [Temporal Cloud](/cloud/overview) that provides forensic access information at the account level, the user level, [Namespace](/namespaces) level, and the Nexus Endpoint level.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

Audit Logging answers "who, when, and what" questions about Temporal Cloud resources.
These answers can help you evaluate the security of your organization, and they can provide information that you need to satisfy audit and compliance requirements.
Expand Down Expand Up @@ -65,6 +65,10 @@ The following list specifies both the supported events and the Temporal APIs tha
- Request increase in Retention Period: `UpdateNamespace`
- Multi-region Namespace
- Failover Namespace: `FailoverNamespace`
- Nexus Endpoint
- Create Nexus Endpoint: `CreateNexusEndpoint`
- Update Nexus Endpoint: `UpdateNexusEndpoint`
- Delete Nexus Endpoint: `DeleteNexusEndpoint`

### API Key Operation events

Expand Down
5 changes: 4 additions & 1 deletion docs/production-deployment/cloud/metrics/reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ Use the following labels to filter metrics:
| `le` | Less than or equal to (`le`) is used in histograms to categorize observations into buckets based on their value being less than or equal to a predefined upper limit. |
| `operation` | This includes operations such as:<ul><li>SignalWorkflowExecution</li><li>StartBatchOperation</li><li>StartWorkflowExecution</li><li>TaskQueueMgr</li><li>TerminateWorkflowExecution</li><li>UpdateNamespace</li><li>UpdateSchedule</li></ul> See: [Metric Operations](#metrics-operations) |
| `resource_exhausted_cause` | Cause for resource exhaustion. |
| `task_type` | Activity or Workflow. |
| `task_type` | Activity, Workflow, or Nexus. |
| `temporal_account` | Temporal Account. |
| `temporal_namespace` | Temporal Namespace. |
| `temporal_service_type` | Frontend or Matching or History or Worker. |
Expand Down Expand Up @@ -265,6 +265,7 @@ Temporal Cloud includes the following operations labels:
- OperatorDeleteNamespace
- PatchSchedule
- PollActivityTaskQueue
- PollNexusTaskQueue
- PollWorkflowExecutionHistory
- PollWorkflowExecutionUpdate
- PollWorkflowTaskQueue
Expand All @@ -280,6 +281,8 @@ Temporal Cloud includes the following operations labels:
- RespondActivityTaskCompletedById
- RespondActivityTaskFailed
- RespondActivityTaskFailedById
- RespondNexusTaskCompleted
- RespondNexusTaskFailed
- RespondQueryTaskCompleted
- RespondWorkflowTaskCompleted
- RespondWorkflowTaskFailed
Expand Down
59 changes: 59 additions & 0 deletions docs/production-deployment/cloud/nexus/getting-started.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
id: getting-started
slug: /cloud/nexus/getting-started
title: Getting Started with Temporal Nexus
description: Learn how to get started with Temporal Nexus, including setting up Nexus Endpoints and integrating Nexus into your Temporal workflows.
sidebar_label: Getting Started
tags:
- temporal cloud
- nexus setup
keywords:
- Temporal Nexus onboarding
- setting up Nexus Endpoints
- temporal cloud
- nexus setup

---

Temporal Nexus Public Preview is available within your Temporal Cloud account, with support in the Temporal Go SDK.

Calls across existing Namespaces can be enabled by creating a Nexus Endpoint in the Nexus API Registry, creating a Nexus Service in a Worker in the handler Namespace, and then using the Nexus Service from a caller Workflow in a different Namespace.

Monolithic Namespaces can be decomposed into multiple Namespaces, by hiding service implementations behind a Nexus Endpoint in the monolithic Namespace, pointing all consumers at the new Nexus Endpoint, and then changing the Endpoint’s target Namespace to a different Namespace. Multiple Nexus Endpoints can target a single monolithic Namespace.
Copy link
Contributor

@fairlydurable fairlydurable Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really confusing and I don't understand the why behind this. Is the goal that there's one entry point that can be broken down to simple components and treated as a single endpoint, so you don't have to use a monolithic implementation? I think this is sort of what you're aiming for.

Suggested change
Monolithic Namespaces can be decomposed into multiple Namespaces, by hiding service implementations behind a Nexus Endpoint in the monolithic Namespace, pointing all consumers at the new Nexus Endpoint, and then changing the Endpoint’s target Namespace to a different Namespace. Multiple Nexus Endpoints can target a single monolithic Namespace.
Monolithic Namespaces can be decomposed into multiple Namespaces by hiding service implementations behind Nexus Endpoints.
This design points consumers to a single Nexus Endpoint and then redirects different work to specialized Namespaces.
This allows each subordinate Namespace to specialize its functionality without building monoliths.
Further, Nexus Endpoint redirects are not exclusive.
More than one Nexus Endpoint can redirect to the same subordinate Namespace.
This means Namespace functionality to be shared between Nexus solutions.


## Cross-Namespace

Calls across existing Namespaces can be enabled by creating a Nexus Endpoint in the Nexus API Registry, creating a Nexus Service in a Worker in the handler Namespace, and then using the Nexus Service from a caller Workflow in a different Namespace.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

Steps to enable calls across existing Namespaces \- see the [Develop examples]:

1. Add Nexus `Services` to the same `Workers` as the Temporal primitives being abstracted.
2. Add a Nexus `Endpoint` that:
1. Targets handler Namespace.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
2. Allows the caller Namespace.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
3. Make Nexus calls from a caller Workflow in a different Namespace.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
1. Use `workflow.NewNexusClient(endpointName, serviceName)`.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
2. Execute a Nexus Operation: `nexusClient.ExecutionOperation(...)`
jsundai marked this conversation as resolved.
Show resolved Hide resolved

## Decompose a Monolithic Namespace

Multiple Nexus Endpoints can target a single monolithic Namespace, and then each Endpoint can be updated, one at a time, to target separate Namespaces, for an incremental migration.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

Once Nexus Endpoints are in place, targeting a new Namespace can be done with config changes and zero downtime, as new Nexus requests will be routed to the new target Namespace, and existing Nexus requests will be completed in the old Namespace.

Steps to decompose a large monolithic Namespace:

1. Hide service implementations behind a Nexus `Endpoint`
1. Add Nexus `Services` to the same `Workers` as the Temporal primitives being abstracted
2. Add Nexus `Endpoints` to the Nexus API Registry, with monolithic Namespace as the target
2. Service consumers use the Nexus Endpoint instead of the underlying implementation
1. Can be done incrementally, until there are no direct caller dependencies on the underlying service implementations (the underlying Temporal primitives)
3. Move service implementations to a different Namespace
1. Create a new `Namespace`
2. Add a `Worker` deployment with the Nexus `Service`
3. Update the Nexus `Endpoint` target to the new Namespace
4. Configure the Endpoint allowlist to allow calls from the original monolithic Namespace
5. New Nexus `Operations` will be routed from callers in the monolithic Namespace to the new Namespace
4. Quiesce Nexus Operations on the old target Namespace
1. Leave the old `Worker` deployment running until all existing Nexus Operations in the old Namespace have completed (and their underlying `Workflows`, if any).
2. Old Workers in the monolithic Namespace can have the service implementation removed, since it is now being served from the new `Endpoint` target Namespace.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
36 changes: 36 additions & 0 deletions docs/production-deployment/cloud/nexus/index.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
id: index
slug: /cloud/nexus
title: Temporal Nexus
description: Discover Temporal Nexus, a powerful feature for cross-namespace collaboration, modular workflow design, and enhanced security in Temporal Cloud.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
sidebar_label: Temporal Nexus
tags:
- temporal cloud
- temporal nexus
- modular workflows
- cross-namespace workflows
- cloud architecture
keywords:
- Temporal Nexus
- cross-namespace workflows
- modular workflow design
- workflow orchestration
- Temporal Cloud security
- Temporal Cloud services
- distributed workflows
- workflow API management
---

:::info SUPPORT, STABILITY, and DEPENDENCY INFO

Temporal Nexus is in [Public Preview](https://docs.temporal.io/evaluate/development-production-features/release-stages\#public-preview) for Temporal Cloud.

:::

This Temporal Nexus guide covers the following topics:

- [Overview](/cloud/nexus/overview)
- [Using Nexus](/cloud/nexus/using-nexus)
- [Pricing](/cloud/nexus/pricing)
- [Getting Started](/cloud/nexus/getting-started)
- [Operations](/cloud/nexus/operations)
181 changes: 181 additions & 0 deletions docs/production-deployment/cloud/nexus/operations.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
id: operations
slug: /cloud/nexus/operations
title: Temporal Nexus Operations
description: Explore how to manage and debug Nexus Operations.
sidebar_label: Temporal Nexus Operations
tags:
- temporal cloud
- nexus operations
keywords:
- Temporal Nexus operations
- temporal cloud

---

jsundai marked this conversation as resolved.
Show resolved Hide resolved
jsundai marked this conversation as resolved.
Show resolved Hide resolved
## Execution Debugging

Execution debugging with Nexus includes end-to-end executions that span:

* Caller Workflow.
jsundai marked this conversation as resolved.
Show resolved Hide resolved
* One or more Nexus Operations that are routed within and across Namespaces.
* Underlying Temporal primitives created by a Nexus Operation handler like a Workflow.

Multi-level Nexus calls are supported:

* Workflow A \-\> Nexus Op 1 \-\> Workflow B \-\> Nexus Op 2 \-\> Workflow C

### Underlying Workflow ID is returned as the Nexus Operation ID

When a Nexus Operation is started by a caller Workflow that is processed by a Temporal SDK NewWorkflowRunOperation handler, the underlying Workflow ID is returned as the Nexus Operation ID which is reflected in the Nexus Operation Started event in the caller’s Workflow history.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

![Workflow history](/img/nexus/nexus-operations-ui.png)

This can be used to search the handler’s Namespace for that Workflow ID:

![Search handler's Namespace](/img/nexus/nexus-operations-ui-search-handler.png)

This may also be done using: `temporal workflow show –detailed`

```

--------------- [5] NexusOperationScheduled ---------------
endpoint: myendpoint
endpointId: 80a4fb3e7ab145eabc6a3b15e327548f
eventTime: 2024-08-28T03:44:34.985230930Z
input.Language: es
input.Name: Nexus
operation: say-hello
requestId: 1307660f-7f2e-4626-8629-851a0e468482
scheduleToCloseTimeout: 0s
service: my-hello-service
taskId: 158300487
version: 1265
workflowTaskCompletedEventId: 4

--------------- [6] NexusOperationStarted ---------------
eventTime: 2024-08-28T03:44:35.198292012Z
operationId: 1307660f-7f2e-4626-8629-851a0e468482
requestId: 1307660f-7f2e-4626-8629-851a0e468482
scheduledEventId: 5
taskId: 158300491
version: 1265


```

Which can then be searched using: `temporal workflow list –query`.

However this requires knowing the Endpoint’s target Namespace and manual steps, which is why we’ve created [bi-directional linking for Nexus Operations] to navigate forwards and backwards across Workflow event histories, via the Nexus Operations and underlying Temporal primitives they may create.

### Pending Operations

Similar to pending Activities, pending Nexus Operations are displayed in the Workflow details page and using: `temporal workflow describe`.

For example, from the Temporal UI:
![Pending Operations](/img/nexus/pending-nexus-operations.png)

For example, from the `temporal` CLI:

```
temporal workflow describe

Pending Nexus Operations: 1

Endpoint myendpoint
Service my-hello-service
Operation echo
OperationID
State BackingOff
Attempt 6
ScheduleToCloseTimeout 0s
NextAttemptScheduleTime 20 seconds from now
LastAttemptCompleteTime 11 seconds ago
LastAttemptFailure {"message":"unexpected response status: \"500 Internal Server Error\": internal error","applicationFailureInfo":{}}
```

### Pending Callbacks

Nexus callbacks are sent from the handler’s Namespace to the caller’s Namespace to complete an asynchronous Nexus Operation.
These show up in the UI and using: `temporal workflow describe`.

For example, from the Temporal UI:
![Pending Callbacks](/img/nexus/nexus-callback.png)

For example, from the `temporal` CLI:

```
temporal workflow describe


Callbacks: 1

URL https://nexus.phil-caller-Namespace.a2dd6.cluster.tmprl.cloud:7243/Namespaces/phil-caller-Namespace.a2dd6/nexus/callback
Trigger WorkflowClosed
State Succeeded
Attempt 1
RegistrationTime 32 minutes ago
```

## Metrics

Scheduling and processing a Nexus Operation is reported via existing cloud metrics via the following operation metric labels:

* Caller Namespace
* RespondWorkflowTaskCompleted \- is used to schedule the Nexus Operation.
* Handler Namespace
* PollNexusTaskQueue
jsundai marked this conversation as resolved.
Show resolved Hide resolved
* RespondNexusTaskCompleted
jsundai marked this conversation as resolved.
Show resolved Hide resolved
* RespondNexusTaskFailed
jsundai marked this conversation as resolved.
Show resolved Hide resolved

There are a [preliminary set of metrics within Temporal](https://www.google.com/url?q=https://github.com/temporalio/saas-control-plane/blob/19bb8e3cc8fa9b276c83780ae069739ebb248743/cmn/auditlog/types.go%23L66\&sa=D\&source=docs\&ust=1724449565959937\&usg=AOvVaw1Olh8jQt3lsqu-OKkg5Zcv) for the internal Nexus RPC calls it makes, but they are still being finalized before being exposed as [cloud metrics](https://docs.temporal.io/production-deployment/cloud/metrics/reference).
jsundai marked this conversation as resolved.
Show resolved Hide resolved

See [Cloud Metrics](\#metrics).

## Audit Logging

The following Nexus control plane actions are sent to the Audit Logging integration:

* Create Nexus Endpoint: `CreateNexusEndpoint`
* Update Nexus Endpoint: `UpdateNexusEndpoint`
* Delete Nexus Endpoint: `DeleteNexusEndpoint`

See [Audit Logging](\#audit-logging) for details.

## Rate Limiting

Nexus requests (commands, polling) are counted as part of the overall Namespace RPS limit in both the caller and handler Namespaces. Default Namespace RPS limits are set at 1600 and automatically adjust based on recent usage (over prior 7 days).

See [Nexus Rate Limits] for additional details.

## SLOs & SLAs

Nexus requests (commands, polling) have the same [latency SLOs] and [error rate SLAs] as other Worker requests in both the caller and handler Namespaces.

See [Availability] and [SLA].

## Limits

**Max Nexus Endpoints** \- By default, each account is provisioned with a max of ten Nexus `Endpoints`. You can request further increases beyond the initial 10 `Endpoint` limit by opening a support ticket.

**Workflow Max Nexus Operations** \- A single Workflow Execution can have a maximum of 30 in-flight Nexus Operations and 30 total Nexus Operations (as public preview does not yet remove completed Nexus Operations from mutable state). After that limit is reached, no more Nexus Operations will be processed for that Workflow Execution.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

**Nexus Request Handler Timeout** \- Nexus Operation handlers have less than 10 seconds to process a single Nexus start or cancel request. Handlers should observe the context deadline and ensure they do not exceed it. This includes fully processing a synchronous Nexus operation and starting an asynchronous Nexus operation, for example one that starts a Workflow. If a handler doesn’t respond within a context deadline, a context deadline exceeded error will be tracked in the caller workflow’s pending Nexus operations, and the Nexus Machinery will retry the Nexus request with an exponential backoff policy.

**Nexus Operation Maximum Duration** \- Each Nexus `Operation` has a maximum `ScheduleToClose` duration of 60 days, which is most applicable to asynchronous Nexus `Operations` that are completed with an asynchronous callback using a separate Nexus request from the handler back to the caller Namespace. For enhanced security, completion callbacks may be signed with a single use token in the future, and the 60 day maximum allows us to rotate the asymmetric encryption keys used for completion callback request signing. While the caller of a Nexus `Operation` can configure the `ScheduleToClose` duration to be shorter than 60 days, the maximum duration can not be extended beyond 60 days and will be capped by the server to 60 days.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

See [Nexus Limits] for additional details.

## Secure Routing

Nexus `Endpoints` are only privately accessible from within a Temporal Cloud and mTLS is used for all Nexus communication, including across cloud cells and regions. Workers authenticate to their Namespaces via mTLS or an API key as allowed by their Namespace configuration.

See [Nexus Secure Routing](\#secure-routing) for details.

## Payload Encryption

For payload encryption, the `DataConverter` works the same for a Nexus `Operation` as it does for other payloads sent between a Worker and Temporal Cloud. Currently there is support for a single `DataConverter` to be used for inbound inputs and outbound outputs, which means the encryption key often used in a custom `DataConverter` must be the same for both the caller and the handler. We may support per-Service payload encryption in the future, so please reach out if you need this.
jsundai marked this conversation as resolved.
Show resolved Hide resolved

See [Nexus Payload Encryption & Data Converter] for additional details.


Loading