Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.InvalidOperationException: Workflow graph with handle DefinitionVersionId: not found. #6362

Open
kfrajtak opened this issue Feb 2, 2025 · 11 comments
Assignees
Milestone

Comments

@kfrajtak
Copy link

kfrajtak commented Feb 2, 2025

Description

Workflow with delay does not finish because of the error. The project is using Hangfire for scheduling and sqlite database for persistence. In the database the value of DefinitionVersionId in the corresponding row of WorkflowInstance table is empty string.

Steps to Reproduce

To help us identify the issue more quickly, please follow these guidelines:

  1. Detailed Steps: The workflow is defined in code and added to Elsa runtime in setup.
public class SampleWorkflow : WorkflowBase
{
    protected override void Build(IWorkflowBuilder builder)
    {
        builder.WithDefinitionId("X");
        builder.Description = "ABC";

        builder.Root = new Sequence
        {
            Version = 1,
            RunAsynchronously = true,
            Activities =
            [
                new WriteLine("Start workflow"),
                new Delay(TimeSpan.FromSeconds(3)),
                new WriteLine("Resumed after delay!"),
            ],
        };
    }
}

  1. Attachments: The minimal project is attached - workflows.zip.

  2. Reproduction Rate: every time.

  3. Additional Configuration: no additional configuration

Expected Behavior

I expect the workflow to finish without any issues.

Actual Behavior

The workflow does not continue when resumed after delay.

Environment

  • Elsa Package Version: 3.3.1
  • Operating System: Pop!_OS 22.04 LTS, also tried on Windows 10 with the same result, also tried without Hangfire with the same result.

Log Output

Failed to process the job 'f79257cf-f329-4587-a5ab-68738ba9cb09': an exception occurred. Retry attempt 1 of 10 will be performed in 00:00:40.
      System.InvalidOperationException: Workflow graph with handle DefinitionVersionId:  not found.
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowDefinitionHandle definitionHandle, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.RunInstanceAsync(WorkflowInstance workflowInstance, RunWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.RunInstanceAsync(RunWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at Elsa.Hangfire.Jobs.ResumeWorkflowJob.ExecuteAsync(String name, ScheduleExistingWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
         at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
@sfmskywalker
Copy link
Member

Fixed via da6eda2

@kfrajtak
Copy link
Author

kfrajtak commented Feb 6, 2025

Tried Latest prerelease 3.3.2-rc2, still does not work:

warn: Hangfire.AutomaticRetryAttribute[0]
      Failed to process the job '7d083616-b68d-4fad-adf0-2748f82ed61a': an exception occurred. Retry attempt 1 of 10 will be performed in 00:00:24.
      System.InvalidOperationException: Workflow graph with handle DefinitionVersionId:  not found.
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowDefinitionHandle definitionHandle, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.RunInstanceAsync(WorkflowInstance workflowInstance, RunWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.RunInstanceAsync(RunWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at Elsa.Hangfire.Jobs.ResumeWorkflowJob.ExecuteAsync(String name, ScheduleExistingWorkflowInstanceRequest request, CancellationToken cancellationToken)
         at InvokeStub_TaskAwaiter.GetResult(Object, Object, IntPtr*)
         at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

@sfmskywalker
Copy link
Member

Is that a new job execution, or is it trying to run already scheduled jobs?

@sfmskywalker
Copy link
Member

I updated your sample project to the 3.3.2-rc2 packages, and the error has disappeared (after making sure there were no existing jobs scheduled).

@kfrajtak
Copy link
Author

kfrajtak commented Feb 6, 2025

I deleted the whole bin\debug folder where the database is created. I also changed the connection string.
It is a new job execution, Hangfire is using sqlite too, but I don't see any Hangfire tables there ...
There's only one instance of workflow:

Image

@sfmskywalker
Copy link
Member

Perhaps try the same zip file you shared with me, upgrade it to 3.3.2 (just released), and make sure to clear the Hangfire tables (or configure Hangfire to connect to a different database like you attempted).

For me, your zip file works flawlessly. I was able to reproduce the symptom that you described, and they were gone after updating to 3.3.2-rc2 (after cleaning Hangfire tables).

@kfrajtak
Copy link
Author

kfrajtak commented Feb 7, 2025

Upgraded to 3.3.2. Even with Hangfire dependencies removed and Hangfire not used at all in the process, I still get the error:

fail: Elsa.Scheduling.ScheduledTasks.ScheduledSpecificInstantTask[0]
      Error scheduled task
      System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
       ---> System.AggregateException: One or more errors occurred. (Workflow graph with handle DefinitionVersionId:  not found.)
       ---> System.InvalidOperationException: Workflow graph with handle DefinitionVersionId:  not found.
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowDefinitionHandle definitionHandle, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.GetWorkflowGraphAsync(WorkflowInstance workflowInstance, CancellationToken cancellationToken)
         at Elsa.Workflows.Runtime.LocalWorkflowClient.RunInstanceAsync(WorkflowInstance workflowInstance, RunWorkflowInstanceRequest request, CancellationToken cancellationToken)

I have attached the Hangfire free project and output log.

@sfmskywalker
Copy link
Member

Ah, I see. This is not related to Hangfire then. Thank you for the reproduction, I will look into it.

@sfmskywalker sfmskywalker reopened this Feb 7, 2025
@sfmskywalker sfmskywalker self-assigned this Feb 7, 2025
@sfmskywalker sfmskywalker added this to the Elsa 3.3.3 milestone Feb 7, 2025
@sfmskywalker sfmskywalker moved this to In Progress in ELSA 3 Feb 7, 2025
@sfmskywalker
Copy link
Member

I see what the issue is. There's a small design flaw with the way programmatic workflows are used with the WorkflowRunner in combination with long-running workflows.

In order for a long-running workflow to be resumed (e.g. when using the Delay activity), the workflow definition must exist in the store. This is the case, because you register the workflow with the system. When the workflow gets added to the store, the CLR workflow provider generates a different version ID for the workflow definition. Also, when using a custom identifier such as "X", that value is associated with the workflow definition.

So far, so good.

Here is where things don't line up:

When calling WorkflowRunner.RunAsync<SampleWorkflow>(), internally it will attempt to execute the workflow, which will succeed. The issue, however, is that the identity of the built workflow will differ from the identity produced by the CLR workflow provider. This means that the workflow instance will reference a different definition ID that does not exist in the store. Which then causes the workflow instance resumption to fail.

To work around this issue, you can update your code like this:

var workflowDefinitionService = serviceProvider.GetRequiredService<IWorkflowDefinitionService>();
var workflowGraph = await workflowDefinitionService.FindWorkflowGraphAsync("X", VersionOptions.Published);
await workflowRunner.RunAsync(workflowGraph, new RunWorkflowOptions { Input = input });

Basically what we are doing is taking care of selecting the workflow definition to execute ourselves, which we then pass into the workflow runner, instead of passing only the workflow type, which doesn't provide enough information to the runner because it doesn't go through the workflow provider API.

I'll keep this issue open to fix that for 3.4.

@sfmskywalker sfmskywalker modified the milestones: Elsa 3.3.3, Elsa 3.4 Feb 7, 2025
@sfmskywalker sfmskywalker moved this from In Progress to Todo in ELSA 3 Feb 7, 2025
@kfrajtak
Copy link
Author

kfrajtak commented Feb 7, 2025

@sfmskywalker it works now! Thanks for the workaround.

FYI, when starting with clean database this

var workflowDefinitionService = serviceProvider.GetRequiredService<IWorkflowDefinitionService>();
var workflowGraph = await workflowDefinitionService.FindWorkflowGraphAsync("X", VersionOptions.Published);

results in an error

fail: Microsoft.EntityFrameworkCore.Query[10100]
      An exception occurred while iterating over the results of a query for context type 'Elsa.EntityFrameworkCore.Modules.Management.ManagementElsaDbContext'.
      Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 1: 'no such table: WorkflowDefinitions'.
         at Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(Int32 rc, sqlite3 db)

but

await workflowRunner.RunAsync<SampleWorkflow>(new RunWorkflowOptions { Input = input });

will create migrate the schema and create the tables.

@kfrajtak
Copy link
Author

Also when you change the version of the workflow to 2, you get an incident, status is Finished and substatus Faulted.

Activity type 'Elsa.Sequence' could not be found.
   at Elsa.Workflows.WorkflowExecutionContext.CreateActivityExecutionContextAsync(IActivity activity, ActivityInvocationOptions options)
   at Elsa.Workflows.ActivityInvoker.InvokeAsync(WorkflowExecutionContext workflowExecutionContext, IActivity activity, ActivityInvocationOptions options)
   at Elsa.Workflows.Middleware.Workflows.DefaultActivitySchedulerMiddleware.ExecuteWorkItemAsync(WorkflowExecutionContext context, ActivityWorkItem workItem)
   at Elsa.Workflows.Middleware.Workflows.DefaultActivitySchedulerMiddleware.InvokeAsync(WorkflowExecutionContext context)
   at Elsa.Workflows.Middleware.Workflows.ExceptionHandlingMiddleware.InvokeAsync(WorkflowExecutionContext context)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants