Releases: uber/cadence
v1.2.10
What's Changed
- Update duplicate request error to include request type by @Shaddoll in #5910
- Update mutable state to generate workflow requests by @Shaddoll in #5821
- Add AsDuplicateRequestError function by @Shaddoll in #5914
- Bugfix for enumer in go 1.22 by @Groxx in #5915
- Add tests for common/persistence/retryer.go by @natemort in #5911
- Add tests for common/persistence/shardManager.go by @natemort in #5916
- Add tests for persistence/workflow_execution_info.go by @natemort in #5918
- Add more unit test to history handler by @timl3136 in #5897
- Get rid of mutex in matching/liveness and reduce test duration by @taylanisikdemir in #5917
- Add memo in pinot by @bowenxia in #5902
- Added Executor Interface and TimerTaskExecutorBase with stop() Method and improve context management in TimerQueueProcessor by @timl3136 in #5920
- [code-coverage] Add more tests for service/history/decision package by @ketsiambaku in #5909
- Add document explaining the schema of Cassandra executions table by @Shaddoll in #5921
- Add tests for ReadHistoryBranch by @jakobht in #5899
- Fix failover error causing child workflows to get stuck by @davidporter-id-au in #5919
- Adding tests for nosqlQueueStore by @dkrotx in #5924
- Changed the error to DomainNotActive for Deprecated domains by @abhishekj720 in #5929
- [code-coverage] clean up tests in history/decision/handler by @ketsiambaku in #5932
- [code-coverage] add tests for HandleDecisionTaskCompleted() by @ketsiambaku in #5934
- Fix bug when pass close status as an integar string by @neil-xie in #5935
- Workaround for query-consistency-strong which is presently partially broken by @davidporter-id-au in #5928
- Fix GetListWorkflowExecutionsByStatusQuery to set status as int by @neil-xie in #5936
- Upgrade apache thrift to v.0.17.0 by @3vilhamster in #5814
- [cassandra] Expose timeout and consistency level configuration by @mantas-sidlauskas in #5675
- Fix slice reuse in cassandra/domain.go by @natemort in #5937
- Add double read for latency comparison for Pinot Migration by @bowenxia in #5927
- Add missing metric tag for GetTaskListSizeRequest by @Shaddoll in #5939
- Add tests for ForkHistoryBranch by @jakobht in #5922
- Migrate Buildkite CI from AWS to GKE agent queues by @mstifflin in #5912
- Fix checksum validation for SQL by @Shaddoll in #5940
- Global ratelimiter, part 2: Any-typed RPCs, mappers, and stub handler by @Groxx in #5817
- Integration test for workflow ID based rate limiting task processing by @sankari165 in #5933
- [code-coverage] Add more tests for HandleDecisionTaskCompleted by @ketsiambaku in #5945
- Update internal types to adopt new IDL changes by @Shaddoll in #5946
- [Pinot] fix bug when querying a string field in attr with an empty value by @bowenxia in #5941
- Add tests for DeleteHistoryBranch by @jakobht in #5943
- We now wait 10 seconds before we start returning shard closed errors, also stop retrying on shard closed errors by @jakobht in #5938
- Revert lowering the new line check by @jakobht in #5954
- Increase timeouts to prevent flakiness by @sankari165 in #5953
- Added tests for GetAllHistoryTreeBranches by @jakobht in #5944
- Bugfix: we address hosts using string(rune(shardID)), not by itoa(shardD) by @dkrotx in #5952
- Add staleness check to RecordChildExecutionCompleted by @Shaddoll in #5955
- [code-coverage] Add more test cases for HandleDecisionTaskCompleted by @ketsiambaku in #5950
- Adding unit tests for client/matching/client.go by @sankari165 in #5959
- [code-coverage] Introduced first set of tests for taskHandler in service/history/decision by @ketsiambaku in #5960
- Fix a bug when set memo in pinot visibility store by @neil-xie in #5961
- unit test for cassandra/visibility.go by @d-vignesh in #5948
- [code-coverage] Tests for Decision taskHandler by @ketsiambaku in #5951
- Publish multiple platform docker image when release server by @neil-xie in #5962
- Updated the changelog for release 1.2.9 by @jakobht in #5963
- Update task executor to handle WorkflowAlreadyCompletedError for signal and cancel workflow by @Shaddoll in #5956
- Fix wrong comment on enableAsyncWorkflowConsumption dynamic config by @taylanisikdemir in #5964
- Add metric for async request payload size by @Shaddoll in #5965
- Async wf consumer manager should watch its enabled/disabled state instead of relying on restart by @taylanisikdemir in #5966
- chore: fix function names in comment by @verytrap in #5894
- Replace wurstmeister kafka/zookeeper images with bitnami kafka image by @taylanisikdemir in #5975
- Split historyEngine.go into small files by @taylanisikdemir in #5972
- Added unit tests for service/history/handler by @timl3136 in #5970
- Add unit tests for mutable state task refresher by @Shaddoll in #5971
- Revert codecov patch threshold to 85% by @taylanisikdemir in #5982
- Api handler test respond activity task failed alternate by @ibarrajo in #5980
- Move shardscanner workflow tests to the shardscanner package by @natemort in #5981
- Add tests for service/frontend/config/config.go by @natemort in #5968
- Added tests for the history_events.go by @agautam478 in #5978
- Added additional unit tests for service/history/handler.go by @timl3136 in #5984
- Reduce flakiness on workflow-ID-specific ratelimit test by @Groxx in #5986
- Enforcing go vet -copylocks and fixing current violations by @Groxx in #5967
- Added new tests to config_Store_client_test.go by @agautam478 in #5983
- Add tests for history/execution/history_builder.go by @natemort in #5977
- History engine start/stop unit tests by @taylanisikdemir in #5985
- Added tests to history_events.go. by @agautam478 in #5988
- Added unit tests for history handler by @timl3136 in #5987
- Add unit test for open search client bulk requests by @neil-xie in #5974
- Add tests for history/engine/engineimpl/describe_workflow_execution.go by @natemort in #5992
- Add test for NewHistoryReplicator in history_replicator.go by @bowenxia in #5994
- Added additional unit tests for methods history/handler.go by @timl3136 in #5993
- lowering threshold for PRs for a one-time refactor/split by @davidporter-id-au in #5997
- Add unit test for frontend/admin/handler - part 1 by @neil-xie in #5991
- Minor splitting of mutable state builder file by @davidporter-id-au in #5990
- Write tests for history engine's RefreshWorkflowTasks by @taylanisikdemir in #5995
- Update coverage exclusions by @taylanisikdemir in #5999
- Replication task processor shutdown improvements and start/stop unit tests by @taylanisikdemir in #5996
- Added additional unit tests testing history handler by @timl3136 in #6001
- Add test coverage for service/history/engine/engineimpl/reset_workflow_execution.go by @natemort in #6002
- mutable-state: copy to persistence round-trip test by @davidporter-id-au in #5998
- Added tests for GetResurrected timers in integrity for hi...
v1.2.9
What's Changed
- Addition of tests for ArchivalConfigStateMachine in common/domain by @abhishekj720 in #5698
- Introduce new dynamic config for enabling wfID based ratelimiting by @jakobht in #5703
- Add unit tests for sql plugin registration by @Shaddoll in #5705
- Add unit tests for sql helper functions by @Shaddoll in #5706
- Add unit test for helper function of sql execution store by @Shaddoll in #5707
- Generate a metadata file artifact in unit test buildkite job by @taylanisikdemir in #5708
- Write tests for cdb.UpdateWorkflowExecutionWithTasks by @taylanisikdemir in #5709
- Add unit tests for helper functions in sql execution store util by @Shaddoll in #5710
- Add unit tests for CreateWorkflowExecution by @Shaddoll in #5715
- Test: Addition of tests for replicationQueue publish and publish to dlq by @abhishekj720 in #5700
- Implemented ratelimiting for external calls pr wfid (guarded by feature flag) by @jakobht in #5704
- remove old metrics wrappers and use new generated metered wrappers by @3vilhamster in #5717
- Proper shutdown of kafka consumer impl and fix test by @taylanisikdemir in #5712
- Add additional unit tests for functions in constants.go by @timl3136 in #5713
- Initial codecov integration by @taylanisikdemir in #5711
- Add tests for UpdateWorkflowExecution by @Shaddoll in #5718
- Tests for UpdateWorkflowEecution in nosql store-Part1 by @agautam478 in #5719
- Add unit tests for ConflictResolveWorkflowExecution by @Shaddoll in #5721
- Add tests for elasticsearch v6 client by @neil-xie in #5716
- Add unit tests for persistence task types in DataManagerInterfaces by @timl3136 in #5720
- Add unit tests for CreateFailoverMarkerTasks by @Shaddoll in #5724
- Change noisy frontend poll timeout log to debug level by @taylanisikdemir in #5725
- Added unit tests for nosql_execution_Store_util.go - Part1 by @agautam478 in #5723
- Straightforwardly fixes a few minor copy bugs and adds a small fuzz util by @davidporter-id-au in #5572
- Add test for ES v6 client Search method by @neil-xie in #5727
- Tests for Common/Domain: Adding tests for replication queue message handling and ack update by @abhishekj720 in #5730
- Add more unit tests for persistence task types in DataManagerInterfaces by @timl3136 in #5726
- Added two more test cases for the updateworkflowexecution by @agautam478 in #5722
- [history] refactor history client with timeout wrapper by @shijiesheng in #5728
- Add unit tests for PinotVisibilityStore by @bowenxia in #5714
- Removed errors file from test coverage by @abhishekj720 in #5735
- Test for Common/domain/replication_queue: GetMessagesfromDLQ & AckLevel by @abhishekj720 in #5734
- Added unit tests for Delete current and workflow execution, list all … by @agautam478 in #5733
- Added unit tests for PrepareResetWorkflowExecutionRequestWithMapsAndE… by @agautam478 in #5731
- Adding more unit tests for ES v6 client by @neil-xie in #5739
- Tests for GetDLQAckLevel and UpdateDLQAckLevel by @abhishekj720 in #5740
- Add unit tests for TaskInfo types and utility functions by @timl3136 in #5732
- Tests for common/domain: tests TestGetDLQSize, TestRangeDeleteMessagesFromDLQ and TestDeleteMessageFromDLQ by @abhishekj720 in #5741
- Add error case tests for pinot_visibility_store by @bowenxia in #5746
- Add unit test for util methods in es v6 client bulk processor by @neil-xie in #5748
- Add unit tests for GetWorkflowExecution by @Shaddoll in #5736
- Adds test for execution/mutable_state_builder.go by @davidporter-id-au in #5744
- Add unit tests for the util functions in data_manager_interface by @timl3136 in #5742
- Very minor nil-or-empty cleanup by @Groxx in #5745
- Added more tests for nosql_execution_store.go by @agautam478 in #5738
- Write more tests for cassandra/workflows.go by @taylanisikdemir in #5750
- Added more tests for nosql_execution_stor_util.go by @agautam478 in #5752
- Enforce leading space on comments by @Groxx in #5747
- Add unit tests for common/persistence/sql/factory.go by @Shaddoll in #5751
- [history] fix generated timeout wrapper by @shijiesheng in #5737
- Add unit tests for functions in gocql/batch.go by @timl3136 in #5759
- Add test for es v6 bulk processor by @neil-xie in #5758
- Added test for replicationTaskExecutor: execute by @abhishekj720 in #5754
- Add unit test for ES v7 client by @neil-xie in #5760
- Added test cases for more util methods by @agautam478 in #5755
- More unit tests for nosql_execution_store_test.go by @agautam478 in #5753
- Add unit test for pinot folder with coverage to 93.4% by @bowenxia in #5761
- [code-coverage] update admin and frontend client to use generated code by @ketsiambaku in #5702
- Tests for PurgeAckedMessages and replicationMessage in common/domain/replication_queue by @abhishekj720 in #5749
- Code cleanup for sql package by @Shaddoll in #5756
- Add unit test for es v7 bulk processor by @neil-xie in #5764
- Added test for pinot_visibility_metric_clients.go by @bowenxia in #5767
- adding mutable state builder tests - adding continue-as-new events by @davidporter-id-au in #5768
- Refactor/adding mutable state builder tests iv by @davidporter-id-au in #5769
- Add unit test for open search client part 1 by @neil-xie in #5774
- minor mutable-state log fix by @davidporter-id-au in #5776
- refactor common/persistence/pinot tests by @bowenxia in #5777
- Addition of tests for archivalConfigStateMachine in common/domain by @abhishekj720 in #5778
- Re-enable sql unit test by @Shaddoll in #5779
- Test: Validate domain config test for attrValidator by @abhishekj720 in #5699
- refactor pinot_visibility_store_test by @bowenxia in #5780
- [code-coverage] Generate code for matching client timeout wrapper by @ketsiambaku in #5771
- Fix data race in matching test suite by @taylanisikdemir in #5781
- hot fix for unit test cases that might cause a failure by @bowenxia in #5787
- Adding unit tests for TestPrepareTransferTasksForWorkflowTxn by @agautam478 in #5763
- Ignore requests send from pinot response comparator by @bowenxia in #5788
- Coverage for dataStoreInterfaces by @Groxx in #5743
- Retryable error for workflow rate limits in task processing by @sankari165 in #5782
- Re-enable kafka consumer test by @taylanisikdemir in #5791
- Global ratelimiter, part 1: core algorithm for computing weights by @Groxx in #5689
- Write tests for cassandra SelectWorkflowExecution by @taylanisikdemir in #5792
- Fix workflow deletion by @Shaddoll in #5793
- Fix checksum validation for SQL implementation by @Shaddoll in #5790
- added unit test for function in mapper-thrift-configstore file by @d-vignesh in #5789
- Error mapper tests by @jakobht in #5795
- Add a benchmark test for crc checksum by @Shaddoll in #5798
- Add metric and retry backoff for checksum failure by @Shaddoll in #5797
- Added new er...
v1.2.8
What's Changed
Added
- Adding unit-test for matching:newTaskListID by @dkrotx in #5513
- Get/Update DomainAsyncWorkflowConfiguration methods in admin API and CLI by @taylanisikdemir in #5616
- Workflow ID cache size metric by @jakobht in #5619
- Add a helper script to run cassandra and execute tests by @taylanisikdemir in #5620
- Scaffold StartWorkflowExecutionAsync API by @Shaddoll in #5621
- Scaffold async workflow queue provider component by @Shaddoll in #5627
- Update run_cass_and_test.sh script to setup cassandra schemas by @taylanisikdemir in #5628
- Add debug logs in PinotTripleVisibilityManager for response comparator testing by @bowenxia in #5631
- Adding a sample call to TaskValidator in update workflow cycle by @agautam478 in #5634
- Add a middleware for comparator to use by @bowenxia in #5637
- Generate rate limit frontend api handler by @Shaddoll in #5636
- Add generic OAuth support by @mantas-sidlauskas in #5638
- Added metrics for when we rate limit by @jakobht in #5640
- Implement StartWorkflowExecutionAsync API by @Shaddoll in #5642
- Added 2 more tags in log for comparator to use. by @bowenxia in #5646
- Async workflow request consumer manager in worker by @taylanisikdemir in #5655
- Add async workflow request consumer for Start/SignalWithStart support by @taylanisikdemir in #5658
- Set rate limit on Async APIs by @Shaddoll in #5659
- Implement SignalWithStartWorkflowExecutionAsync API by @Shaddoll in #5657
- Docker compose setup for async workflows with kafka queue by @taylanisikdemir in #5663
- Add a
make pr
target for an easy "do automated checks for PR" command by @Groxx in #5670 - Added debug information for decision timeout handling by @3vilhamster in #5674
- Async workflows integration test with kafka by @taylanisikdemir in #5678
- Add missing IsolationGroups field in domain cache entry by @taylanisikdemir in #5679
- Add close status parse method in pinot query validator by @neil-xie in #5680
- Add async workflow integration test step to CI by @taylanisikdemir in #5681
- Add metrics for external calls for the workflow ID specific rate limits by @jakobht in #5684
- Write tests for cdb (Cassandra DB wrapper) basic functions by @taylanisikdemir in #5686
- Added a unit test for nosql execution store - createworkflowexecution by @agautam478 in #5687
- Write tests for cdb.InsertWorkflowExecutionWithTasks by @taylanisikdemir in #5688
- Added more scenarios to createworkflowexecution test- Part1 by @agautam478 in #5690
- Added a test for the GetworkflowExecution in the nosql_execution_store.go file. by @agautam478 in #5692
- Write tests for cdb.SelectCurrentWorkflow by @taylanisikdemir in #5693
- Support AsyncWorkflowConfiguration decoding in admin CLI by @taylanisikdemir in #5694
Changed
- Replace JWT validation library by @mantas-sidlauskas in #5592
- feat: pprof support config host by @zedongh in #5601
- Refactor persistence serializer tests and add more cases by @taylanisikdemir in #5625
- Upgrade domain_config type in cassandra schema to add async wf config by @taylanisikdemir in #5630
- Refactor frontend API handler and use generated code to emit metrics by @Shaddoll in #5639
- Enable the workflow ID cache in shadow mode for start workflow by @jakobht in #5641
- Filtering the prefix in custom query log for pinot response comparator by @bowenxia in #5643
- The ratelimiter needs to be created with the domain name not the ID by @jakobht in #5644
- Update async workflow queue idl change by @Shaddoll in #5645
- Rewrite async workflow queue provider component by @Shaddoll in #5648
- Store mutable state checksum in SQL storage by @Shaddoll in #5649
- Splitting wfCacheEnabled config for internal and external requests by @sankari165 in #5647
- Convert pinot query to use unix milliseconds instead of nano by @neil-xie in #5650
- Emit metrics when transfer tasks could be ratelimited by @sankari165 in #5652
- Update change log for v1.2.7 release by @neil-xie in #5653
- Update pinot query validator to handle raw time string by @neil-xie in #5656
- Emit metrics when transfer tasks for decisions could be ratelimited by @sankari165 in #5665
- Upgrade pinot client version by @neil-xie in #5666
- Update the build-changed message failure by @Groxx in #5667
- Improve error message for membership resolver by @Shaddoll in #5669
- Emits a counter value for every unique view of the hashring by @davidporter-id-au in #5672
- Refactor history packages by @jakobht in #5673
- Improve test coverage for sql_execution_store_util by @Shaddoll in #5676
- Improve test coverage for sql_execution_store by @Shaddoll in #5677
- Improve test coverage for constants.go by @timl3136 in #5685
- Enable retry on mutable state checksum verification failure by @Shaddoll in #5691
Fixed
- Set proper max reset points by @neil-xie in #5623
- Put a timeout for timer task deletion loop during shutdown by @taylanisikdemir in #5626
- Catch unit test failures in make test by @Groxx in #5635
- fix: get messages between query over message_id typo by @zedongh in #5607
- Fix context leak in tests by @munahaf in #5377
- Make sure task processing rate limiter is only done in the active side by @sankari165 in #5654
- Fix Pinot query validator bug when user pass in not equal query with value missing by @neil-xie in #5662
- Update Pinto query validator failed log, minor refactor pinot visibility store to remove panics by @neil-xie in #5664
- Fix context leak in pinot integration test by @neil-xie in #5682
- Fix SignalWithStartWorkflow API by @Shaddoll in #5671
- Fix wrong migration paths in example by @kotcrab in #5668
- Fix comment in workflow id cache config by @sankari165 in #5661
- Fix the local integration test docker-compose file by @jakobht in #5695
- Do not get workflow execution from database when shard is closed by @Shaddoll in #5697
Removed
- Removed useless metrics tag from the workflowIDcache by @jakobht in #5651
- Removed the shadower service for cadence-server by @agautam478 in #5660
New Contributors
- @zedongh made their first contribution in #5607
- @munahaf made their first contribution in #5377
- @kotcrab made their first contribution in #5668
Full Changelog: v1.2.7...v1.2.8
v1.2.7
What's Changed
Added
- Add metrics to monitor task validation. by @agautam478 in #5466
- Add an "all results" query to scanner/fixer workflows by @Groxx in #5470
- Add retries into Scanner BlobWriter by @agautam478 in #5471
- Added a unit test for the BlobStoreWriter. by @agautam478 in #5472
- Add Debugf and some minor updates to timer queue processor base by @taylanisikdemir in #5475
- Add unit tests for cassandra workflow utils part-1 by @taylanisikdemir in #5476
- Add
workflow query-types
command to CLI by @arzonus in #5456 - Add unit test for cassandra workflow utils part-2 by @taylanisikdemir in #5480
- Unit tests for admin cli decode_thrift command by @taylanisikdemir in #5485
- Add unit test for sqlConfigStore by @Shaddoll in #5491
- Add unit test for mysql configstore by @Shaddoll in #5502
- Add persistence serialization unit tests by @3vilhamster in #5507
- Adding unit tests to workflowHandler_test.go by @sankari165 in #5500
- Add unit tests for AwaitWaitGroup by @arzonus in #5512
- Add unit test for sql domain store by @Shaddoll in #5508
- Add unit test for cassandra workflow utils part-3 by @taylanisikdemir in #5506
- Adding unit tests for RecordActivityTaskHeartbeat by @sankari165 in #5511
- add unit tests for ValidIDLength by @arzonus in #5520
- Test for rate limited wrappers around persistence clients by @3vilhamster in #5518
- Test for error injection clients by @3vilhamster in #5515
- Add unit test for sql history store by @Shaddoll in #5524
- Adding unit tests to RespondActivityTaskCompleted and RecordActivityT… by @sankari165 in #5521
- Add unit tests for IsEntityNotExistsError by @arzonus in #5528
- Add unit tests for CreateXXXRetryPolicy by @arzonus in #5527
- Add unit tests for ValidateRetryPolicy by @arzonus in #5529
- Add unit tests for ConvertGetTaskFailedCauseToErr by @arzonus in #5531
- Add unit tests for WorkflowIDToHistoryShard and DomainIDToHistoryShard by @arzonus in #5533
- Added a unit test for the timer.go file in reconciliation folder. by @agautam478 in #5505
- Adding logging to scanner.go by @agautam478 in #5535
- Adding a metric for hosts not being found in resolver by @davidporter-id-au in #5414
- Added logs to concrete_execution.go by @agautam478 in #5536
- Add unit tests for sql queue store by @Shaddoll in #5541
- Unit tests for timer/transfer queue processor pump loops by @taylanisikdemir in #5540
- Add unit tests for sql shard store by @Shaddoll in #5543
- Add unit test for kafka partition ack manager by @neil-xie in #5545
- Add unit tests for GenerateRandomString by @arzonus in #5532
- Add unit tests for IsValidContext by @arzonus in #5546
- Add unit tests for CreateChildContext by @arzonus in #5547
- Add unit tests for DeserializeSearchAttributeValue by @arzonus in #5548
- Add unit tests for GetSizeOfHistoryEvent by @arzonus in #5550
- Add unit tests for thrift mappers by @taylanisikdemir in #5542
- Add unit tests for sql task store by @Shaddoll in #5558
- Added logs into the current execution.go and a unit test by @agautam478 in #5555
- Add unit test for kafka producer impl by @neil-xie in #5559
- Add shard id to queue processor related metrics by @taylanisikdemir in #5557
- Add unit tests for sql execution store by @Shaddoll in #5565
- Add unit test for new Kafka client by @neil-xie in #5570
- Add unit tests for helper functions in sql execution store util by @Shaddoll in #5571
- Added tests for visibility sampling wrapper by @3vilhamster in #5564
- Add unit test for consumer impl by @neil-xie in #5573
- Add unit tests for workflow state non maps by @Shaddoll in #5578
- Add logs to debug timer tasks by @Shaddoll in #5581
- Added deprecated domain check to the taskvalidator by @agautam478 in #5580
- Add unit tests for IsServiceTransientError by @arzonus in #5551
- Add unit tests for for IsAdvancedVisibilityWritingEnabled by @arzonus in #5552
- Add unit tests for ValidateLongPollXXX by @arzonus in #5553
- Add grafana dashboard to visualize persistence metrics for default docker-compose setup by @taylanisikdemir in #5582
- Add missing exclude-query support to list-workflows on the CLI by @Groxx in #5583
- Add unit tests for DurationToXXX and XXXToDuration by @arzonus in #5530
- Add more debug logs for user timer task execution by @taylanisikdemir in #5595
- Add cache for workflow specific in memory data by @jakobht in #5594
- Added three dynamic config properties by @jakobht in #5602
- add ContextKey Struct by @bowenxia in #5606
- Adding a stale workflow check to the taskvalidator and code cleanup. by @agautam478 in #5604
- Added more error handling in workflow cache by @jakobht in #5611
Fixed
- Improves metric and error handling for history by @davidporter-id-au in #5469
- Address map access data race in matching engine by @taylanisikdemir in #5477
- fix docker compose tests by @3vilhamster in #5479
- Fix copying suite.Suite in integration tests by @3vilhamster in #5481
- fix scavenger test suite by @3vilhamster in #5490
- fix scavenger suite by @3vilhamster in #5498
- Fixing matching:TestCheckIdleTaskList test flackiness by @dkrotx in #5494
- fix leaky goroutines in matching by @3vilhamster in #5499
- Unit test for the fetcher/current.go. by @agautam478 in #5504
- More fixes for golint.sh by @Groxx in #5519
- Fix race between startup and shutdown in task reader by @Groxx in #5522
- Ensure scanner scavenger stops in tests by @3vilhamster in #5510
- Bugfix/debugging stuck tasklist by @davidporter-id-au in #5436
- Fix multiple lock acquire on membership update by @3vilhamster in #5576
- Properly catch errors in ldflag-gathering and fail the build by @Groxx in #5539
- Addressed sync issue in workflow cache by @jakobht in #5605
- fix a comment by @bowenxia in #5610
- Fixed lint errors introduced in previous PR by @jakobht in #5613
Changed
- Update kafka config to have isSecure option by @neil-xie in #5473
- Minor change to include domainTag and pass domainName. by @agautam478 in #5468
- Wrap isSecure config in config map for kafka topic by @neil-xie in #5474
- Update changelog for v1.2.6 release by @neil-xie in #5478
- Unify cassandra setup in docker-compose by @3vilhamster in #5482
- Unify logging in tests by @3vilhamster in #5487
- Updated the unit test for BlobstoreIterator into a table format by @agautam478 in #5488
- update cassandra dev setup by @3vilhamster in #5501
- Converted the existing test for concrete.go execution into a table test by @agautam478 in #5503
- Improve logs/metrics of HandleDecisionTaskCompleted by @taylanisikdemir in #5497
- Revert gofuzz us...
v1.2.6
What's Changed
Added
- Added range query support for Pinot json index by @bowenxia (#5426)
- Implemented GetTaskListSize method at persistence layer by @Shaddoll (#5442, #5447)
- Added a framework for the Task validator service by @agautam478 (#5446)
- Added nit comments describing the Update workflow cycle @agautam478 (#5432)
- Added log user query param by @bowenxia (#5437)
- Added CODEOWNERS file by @taylanisikdemir (#5453)
- Added a function to evict all elements older than the cache TTL by @jakobht (#5464)
Fixed
- Fixed workflow replication for reset workflow by @Shaddoll (#5412)
- Fixed visibility mode for admin when use Pinot visibility by @neil-xie (#5441)
- Fixed workflow started metric by @ketsiambaku (#5443)
- Fixed timer-fixer, unfortunately broken in 1.2.5 by @Groxx (#5433)
- Fixed confusing comment in matching handler by @jakobht (#5450)
Changed
- Cassandra version is changed from 3.11 to 4.1.3 by @taylanisikdemir (#5461)
- If your machine already has ubercadence/server:master-auto-setup image then you need to repull so it works with latest docker-compose*.yml files
- Move dynamic ratelimiter to its own file by @jakobht (#5451)
- Create and use a limiter struct instead of just passing a function by @jakobht (#5454)
- Dynamic ratelimiter factories by @jakobht (#5455)
- Update github action for image publishing to released by @3vilhamster (#5460)
- Update matching to emit metric for tasklist backlog size by @Shaddoll (#5448)
- Change variable name from SecondsSinceEpoch into EventTimeMs by @bowenxia (#5463)
Removed
- Get rid of noisy task adding failure log in matching service by @taylanisikdemir (#5445)
New Contributors
Full Changelog: v1.2.5...v1.2.6
v1.2.5
What's Changed
Added
- Scanner / Fixer changes by @Groxx in #5361
- Stale-workflow detection and cleanup added to shardscanner, disabled by default.
- New dynamic config to better control scanner and fixer, particularly for concrete executions.
- Documentation about how scanner/fixer work and how to control them, see the scanner readme.md
- This also includes example config to enable the new fixer.
- MigrationChecker interface to expose migration CLI by @abhishekj720 in #5424
- Added Pinot as new visibility store option by @neil-xie in #5201
- Added pinot visibility triple manager to provide options to write to both ES and Pinot.
- Added pinotVisibilityStore and pinotClient to support CRUD operations for Pinot.
- Added pinot integration test to set up Pinot test cluster and test Pinot functionality.
Fixed
- Fix CreateWorkflowModeContinueAsNew for SQL by @Shaddoll in #5413
- Fix CLI count&list workflows error message by @ketsiambaku in #5417
- Hotfix for async matching for isolation-group redirection by @davidporter-id-au in #5423
- Fix closeStatus for --format flag by @ketsiambaku in #5422
Full Changelog: v1.2.4...v1.2.5-prerelease3
v1.2.4
What's Changed
- Remove database check for config store tests by @Shaddoll in #5401
- Fix persistence tests setup by @Shaddoll in #5402
- Implement config store for MySQL by @Shaddoll in #5403
- Retract v1.2.3 by @sankari165 in #5406
- Implement config store for PostgresSQL by @Shaddoll in #5405
- Release v1.2.4 by @Shaddoll in #5407
Full Changelog: v1.2.3...v1.2.4
v1.2.3 (Retracted, please use v1.2.4)
Added
Expose workflow history size and count to client by @timl3136 (#5392)
Fixed
[cadence-cli] fix typo in input flag for parallelism by @sankari165 (#5397)
Changed
Update config store client to support SQL database by @Shaddoll (#5395)
Scaffold config store for sql plugins by @Shaddoll (#5396)
Improve poller detection for isolation by @Shaddoll (#5399)
v1.2.2
What's Changed
- add a update workflow execution count metric for RI by @allenchen2244 in #5386
- Pass partition config and isolation group to history/matching even if isolation is disabled by @Shaddoll in #5385
- [CLI] fix nil pointer issue in domain migration command rendering by @shijiesheng in #5378
- Release v1.2.2 by @shijiesheng in #5388
Full Changelog: v1.2.1...v1.2.2
v1.2.1
Project release: Zonal isolation
This version introduces a few resiliency concepts into customers' worker task processing such that they can detect deployment or configuration failures earlier. These features are opt-in.
The high-level concept is to provide a means to subdivide work (called 'isolation-groups') for workers along whatever partitioning mechanism that is required for your service.
By default the partitioning mechanism provided will attempt to keep workflows running in the location the are started, such that customers may identify broken changes earlier, rather than waiting for the deployment of an entire region. However, if there are no pollers available available in that subdivision, it'll route the work elsewhere.
Nomenclature
Partitioning: A means to subdivide the tasks given to workflows, of which there are many possible schemes and one default one provided. When a workflow is started, a group of partition keys are provided by request headers. The partition keys are used to determine which isolation group of workers should process these workflows.
Workflow pinning: A partitioning scheme which emphasizes keeping workflows running in the location they were started
Isolation-groups: A division of work within a customer region in which they can subdivide their workers and pin the workflows. This originally was intended as a synonym for 'zone' in the site reliability, as a subdivision of a region. However the important point is that this is a failure domain for customer workflows, so this may be an arbitrary subdivision of your cluster's traffic.
Isolation-group drain: A means of excluding work from an isolation-group. If an isolation group is drained, workers from that isolation group won't be able to get any task. And customers cannot start workflows from that isolation group.
Default concepts and approaches
The partitioning and isolation concepts are intended to be provided as general purpose orchestration concepts and flexible, with some basic defaults provided. By default the following behaviour is given:
- Partition data is persisted with workflow execution records by the provided middleware if the provided header is passed when workflows are created.
- The cadence client and worker Go libraries will pass these as headers if provided in client options
Pinning behaviour
The workflow original zone is captured on workflow start and will be used on workflow processing.
The default partitioner provides the following behaviour: It will attempt to dispatch work in a zone where the workflow was started. However, workers may not be available in that zone, or no longer available for some reason. So the partitioner takes information from a lookback of poller information and uses this lookback data to ensure that the workflow can be processed. If the the start isolation-group is not available it'll another healthy random one.
'Health', here, is determined as the presence of pollers and the absence of drains.
The 'unpinning' is import for two main reasons: firstly, it's quite possible to start a workflow from an unrelated isolation-group in which the pollers are created and to suddenly blackhole that work would likely be not the desired behaviour. But secondly, and probably more importantly, this prevents a head-of-line blocking problem internally for Cadence. At the database level (in this release anyway) tasks need to be dispatched in-order and so if an isolation-group were to be not processed it would block task processing.
Drains
This release also introduces a simplistic notion of drains, which allow for isolation-groups to be excluded from traffic processing, should that be required. Drains are issuable via the Admin API or via cli:
eg:
cadence admin isolation-groups update-global --set-drains zone-1
cadence admin isolation-groups get-global
This information is stored in the config-store and is not part of dynamic configuration.
Configuration
In order to use this feature, the requisite configuration is required:
system.allIsolationGroups
: This is a list of all the possible isolation-groups
system.enableTasklistIsolation
: This is the bool flag to enable it for a domain
Implementation
The changes for this feature are largely in Matching and can be (reductively) described as: Sync and Async-match in Cadence as being made aware of a new dimension; their associated isolation-group. The tasks piped through the Matching service are matching the appropriate isolation-group channel.
What's Changed
- Set config for shardscanner fixer by @mantas-sidlauskas in #3844
- Fix get raw history for transient decision by @yycptt in #3847
- Fix error handling when processing parent close policy by @yycptt in #3845
- Add logging/metrics for decision attempts by @yycptt in #3849
- Switch to gocql interface by @yycptt in #3837
- Fix NPE in DescribeMutableState by @yycptt in #3850
- Switch the remaining history component to internal types by @vytautas-karpavicius in #3843
- Switch Health status endpoints to internal types by @vytautas-karpavicius in #3842
- reset workflow with no decision task complete by @yux0 in #3687
- error check before return the ActivityLocalDispatchInfo by @mkolodezny in #3853
- Delete unused dynamic configs that have no referrence anymore by @longquanzheng in #3859
- Merge sql updates: Blob size increase by @yux0 in #3858
- Handle matching task list conditional error by @yux0 in #3867
- Fix go-generate by @yycptt in #3864
- Support visibility query with close status represented in string by @yycptt in #3865
- Add timers shardscanner by @mantas-sidlauskas in #3846
- replace string based logging with tagged logs by @mantas-sidlauskas in #3871
- Downgrade golang tools version by @yycptt in #3876
- Add instructions to setup local MySQL and Postgres by @yux0 in #3868
- Make max activity schedule to start timeout for retry configurable by domain by @yycptt in #3878
- Task processing debug logs by @yycptt in #3877
- Transfer queue validator by @yycptt in #3875
- Pick sql index changes by @yux0 in #3866
- Remove strict sanity check to allow reset by @yux0 in #3879
- Improve shard context timeout handling by @yycptt in #3881
- Add domain name tag in failover metrics by @yux0 in #3882
- break out when response is nil by @mantas-sidlauskas in #3886
- Allow using Kafka TLS without cert ca and key by @longquanzheng in #3862
- Fix dynamic config collection logValue function by @yycptt in #3880
- Update read DLQ messages API to return raw task info by @yux0 in #3869
- break if adminClient returns error by @mantas-sidlauskas in #3887
- Latest idl by @yux0 in #3888
- Fix activity lost metrics by @yycptt in #3889
- Add replication error logging and metrics by @yux0 in #3891
- Simplify templateGetLastMessageIDQuery sql query by @andrewjdawson2016 in #3890
- Add task processing workflow busy metric by @yycptt in #3892
- CLI 0.18.0 release by @yycptt in #3896
- Handle data corruption error in replication by @yux0 in #3895
- Add a "help" target to the makefile by @Groxx in #3898
- Initial protobuf types and API by @vytautas-karpavicius in #3863
- Fix workflow reset command by @yycptt in #3904
- CLI 0.18.1 patch release by @yycptt in #3908
- Use GetDomainName instead of GetDomainByID for retrieving domain names by @yycptt in #3899
- Start enabled shardscanner fixers by @mantas-sidlauskas in #3906
- Switch to protoc-gen-go by @vytautas-karpavicius in #3905
- Fix scan unsupported workflow in SQl DB by @yux0 in #3909
- Makefile cleanup / thrift revamp / gobin removed by @Groxx in #3903
- Version goveralls, remove unused go bins from docker setup by @Groxx in #3913
- Remove duplicate doc...