Skip to content

Add ZeroMQ-based message broker plugin#7284

Merged
agoscinski merged 10 commits into
aiidateam:mainfrom
agoscinski:zmqbroker
Apr 29, 2026
Merged

Add ZeroMQ-based message broker plugin#7284
agoscinski merged 10 commits into
aiidateam:mainfrom
agoscinski:zmqbroker

Conversation

@agoscinski

@agoscinski agoscinski commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Please read docs/source/internals/broker.rst in this PR for design doc.

@codecov

codecov Bot commented Mar 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.28505% with 191 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.08%. Comparing base (8acdefe) to head (77e6990).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/brokers/zmq/communicator.py 83.53% 71 Missing ⚠️
src/aiida/brokers/zmq/server.py 85.81% 43 Missing ⚠️
src/aiida/cmdline/commands/cmd_status.py 35.30% 22 Missing ⚠️
src/aiida/cmdline/commands/cmd_daemon.py 37.50% 15 Missing ⚠️
src/aiida/cmdline/commands/cmd_profile.py 65.39% 9 Missing ⚠️
src/aiida/brokers/zmq/broker.py 95.08% 7 Missing ⚠️
src/aiida/cmdline/commands/cmd_presto.py 70.00% 6 Missing ⚠️
src/aiida/engine/daemon/client.py 45.46% 6 Missing ⚠️
src/aiida/manage/manager.py 42.86% 4 Missing ⚠️
src/aiida/brokers/zmq/protocol.py 93.34% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7284      +/-   ##
==========================================
+ Coverage   79.92%   80.08%   +0.17%     
==========================================
  Files         568      576       +8     
  Lines       44016    45280    +1264     
==========================================
+ Hits        35175    36260    +1085     
- Misses       8841     9020     +179     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/aiida/engine/daemon/worker.py Outdated
Comment thread docs/source/reference/command_line.rst Outdated
@khsrali

khsrali commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator

Notes for me:
broker.py -> broker-client.py

Comment thread src/aiida/brokers/zmq/communicator.py
Comment thread src/aiida/brokers/zmq/protocol.py
Comment thread src/aiida/brokers/zmq/server.py Outdated
@agoscinski

agoscinski commented Mar 12, 2026

Copy link
Copy Markdown
Collaborator Author

Notes for me:
broker.py -> broker-client.py

I merged client.py into broker.py for zmq so ZmqBroker now contains the management functionalities. I think doing this for rmq might be a bit more messy because we create multiple connections but maybe also possible. Maybe a distinction between UserClient, WorkerClient and BrokerServer. Right now also the WorkerClient (Communicator) and UserClient (Broker and RabbitmqManagementClient) are entangled. Also WorkerClients can are subscriber and publisher at the same time because they publish aiida process state transitions. I am not sure why we cannot implement that logic in the process itself. I don't think other workers need to know the state transition but I am not sure if this is related to MonitorCalcJobs. Anyway the gist is: the current naming is very confusing and does not make so much sense but fixing it properly will cost a lot of time while not producing any functionality. I got used to this mess a bit.

Comment thread src/aiida/brokers/zmq/defaults.py Outdated
# Timeout (in seconds) for waiting on RPC Future results in the poll thread.
# None means no timeout, matching kiwipy RMQ behavior where _on_rpc awaits
# without a deadline. The runner event loop will eventually produce a result.
RPC_TIMEOUT: float | None = None

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double check that the rmq.task_timeout will be used here when starting the server

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is now zmq.task_timeout

Comment thread src/aiida/brokers/utils.py Outdated
:return: The reconstructed UUID object
"""
mapping = loader.construct_mapping(node)
return uuid.UUID(int=mapping['int'])

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in RMQ we just secretly convert this to int somewhere in the code. I wanted to make serialization more explicit. Its a bit overkill for one conversion. Maybe there is a better solution.

@agoscinski agoscinski force-pushed the zmqbroker branch 2 times, most recently from ec6a524 to 3f98e3a Compare March 31, 2026 11:55
@mbercx

mbercx commented Apr 6, 2026

Copy link
Copy Markdown
Member

@agoscinski I'll report feedback here, let me know if you would prefer a different location (e.g. HackMD doc).

I wanted to run a calculation:

results = engine.run(builder)

But ran into a ConnectionError (see Traceback).

Traceback
---------------------------------------------------------------------------
ConnectionError                           Traceback (most recent call last)
Cell In[8], [line 1](vscode-notebook-cell:?execution_count=8&line=1)
----> [1](vscode-notebook-cell:?execution_count=8&line=1) results = engine.run(builder)

File ~/project/qe/git/aiida-core/src/aiida/engine/launch.py:46, in run(process, inputs, **kwargs)
     44     runner = process.runner
     45 else:
---> [46](https://file+.vscode-resource.vscode-cdn.net/Users/mbercx/project/qe/jupyter/dev/~/project/qe/git/aiida-core/src/aiida/engine/launch.py:46)     runner = manager.get_manager().get_runner()
     48 return runner.run(process, inputs, **kwargs)

File ~/project/qe/git/aiida-core/src/aiida/manage/manager.py:437, in Manager.get_runner(self, **kwargs)
    431 """Return a runner that is based on the current profile settings and can be used globally by the code.
    432 
    433 :return: the global runner
    434 
    435 """
    436 if self._runner is None:
--> [437](https://file+.vscode-resource.vscode-cdn.net/Users/mbercx/project/qe/jupyter/dev/~/project/qe/git/aiida-core/src/aiida/manage/manager.py:437)     self._runner = self.create_runner(**kwargs)
    439 return self._runner

File ~/project/qe/git/aiida-core/src/aiida/manage/manager.py:476, in Manager.create_runner(self, with_persistence, **kwargs)
    473 if 'communicator' not in settings:
    474     # Only call get_communicator if we have to as it will lazily create
    475     try:
--> [476](https://file+.vscode-resource.vscode-cdn.net/Users/mbercx/project/qe/jupyter/dev/~/project/qe/git/aiida-core/src/aiida/manage/manager.py:476)         settings['communicator'] = self.get_communicator()
    477     except ConfigurationError:
    478         # The currently loaded profile does not define a broker and so there is no communicator
    479         pass

File ~/project/qe/git/aiida-core/src/aiida/manage/manager.py:394, in Manager.get_communicator(self)
    389     assert self._profile is not None
    390     raise ConfigurationError(
    391         f'profile `{self._profile.name}` does not provide a communicator because it does not define a broker'
    392     )
--> [394](https://file+.vscode-resource.vscode-cdn.net/Users/mbercx/project/qe/jupyter/dev/~/project/qe/git/aiida-core/src/aiida/manage/manager.py:394) return broker.get_communicator()

File ~/project/qe/git/aiida-core/src/aiida/brokers/zmq/broker.py:159, in ZmqBroker.get_communicator(self, wait_for_broker)
    157             break
    158     else:
--> [159](https://file+.vscode-resource.vscode-cdn.net/Users/mbercx/project/qe/jupyter/dev/~/project/qe/git/aiida-core/src/aiida/brokers/zmq/broker.py:159)         raise ConnectionError(f'Broker did not become ready within {wait_for_broker}s: {self}')
    161 self._communicator = ZmqCommunicator(
    162     router_endpoint=router_endpoint,
    163 )
    164 self._communicator.start()

ConnectionError: Broker did not become ready within 30.0s: ZMQ Broker @ /Users/mbercx/project/qe/.aiida/broker/8c6a9c1f5a414410b8961f8f315ff1ba <not running>

I checked verdi status:

❮ verdi status
 ✔ version:     AiiDA v2.8.0.post0
 ✔ config:      /Users/mbercx/project/qe/.aiida
 ✔ profile:     dev
 ✔ storage:     SqliteDosStorage[/Users/mbercx/project/qe/.aiida/repository/sqlite_dos_8db1c117ad11421eb3c9bcf59b2461ce]: open,
 ✔ daemon:      Daemon is running with PID 94020, Broker is NOT running

And saw that the Broker is NOT running. Restarting the daemon solved the issue. It's been a while since I used the environment. In fact, I now remember that I had deleted all my profiles in this environment to clean up a few days ago. I was already running the ZMQ here before. Then I created a new dev profile, but I hadn't started the daemon yet.

A few notes here:

  1. Why does running a calculation require the broker at all?
  2. Deleting a profile should probably also kill the daemon/broker? This is probably a separate issue, but it seems strange that I had a running daemon right after creating the new profile.
  3. Can we somehow restart the broker automatically? I think we have something similar implemented for the daemon in case the process ID is lost?
  4. The feedback to the user should probably improve in case the daemon is somehow running without a ZMQ broker.

@agoscinski

agoscinski commented Apr 6, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback!

And saw that the Broker is NOT running. Restarting the daemon solved the issue. It's been a while since I used the environment. In fact, I now remember that I had deleted all my profiles in this environment to clean up a few days ago. I was already running the ZMQ here before. Then I created a new dev profile, but I hadn't started the daemon yet.

For new profiles created from this branch the broker should start with the daemon. If you don't restart the daemon after changing your aiida-core version, the broker is never started. I think we can assume that people restart the daemon when installing a new aiida-core version. There are a lot of things that break if you change aiida-core and don't restart the daemon since the workers still have the old code. We never mentioned that we support hot reloading for workers so I think that problem you describe is acceptable. Maybe we can improve the communication of this problem in a different PR: We store the aiida-core version in some worker PID or config file. Then when people to verdi status if the current installed version of aiida-core mismatches the one that the worker has specified in their PID or config file, we show a warning in verdi status to restart the daemon. This will not work so well for trying out aiida in dev branches since there the version is always the same until we do a releaes. Maybe we just increase the dev version after each PR. I mean 2.8.0dev0 and 2.8.0dev1 and so on. We could implement this as pre-commit hook so this is basically automatized.

  1. Why does running a calculation require the broker at all?

Yes, so we could choose a different design where a broker is not needed. The client directly sends messages to the workers. Since the number of workers and clients are highly limited and do not need to scale well for large numbers, it might be the best design for people using AiiDA on their local machine and connecting to different HPCs. There is only one client and the number of workers is typically the number of processes, which even on a big workstation is limited to 256 workers. However, we already have this broker architecture due to RMQ in place, and changing that is quite a dramatic change that affects more places in the code base. Because I mimicked the RMQ broker pattern, I could implement it by just mimicking the broker communication pattern in the new broker, so it was easy to implement. If you remove the broker completely, then you need to consider things like that messages are not persisted anymore by the broker, so where do you move this responsibility? There are multiple options for this and there is definitely a solution that works for our use case, but they all require changes on the clients, which means we probably need to touch plumpy and kiwipy and discuss this with the team. So let's say we have done this, then we still need to keep the old logic since we need to support the RMQ broker for the use case of many-clients-to-many-workers (only used by a fraction of AiiDA users but still they exist). Then we have two different logics for each broker. It is possible, but it increases maintenance costs definitely. Maybe this is the way we want to go in the future, but maybe this approach which is much simpler is good enough. People anyway need to start the daemon, and if we hide the broker inside the daemon setup, maybe the complexity for the user is roughly the same as not having a broker.

  1. Deleting a profile should probably also kill the daemon/broker? This is probably a separate issue, but it seems strange that I had a running daemon right after creating the new profile.

Right the new broker depends on the aiida profile, similar as the daemon worker (I think that could be changed but requires more massive refactoring). I think I added a daemon start into the profile create so you don't need to anything. Do you think there is a benefit in not starting the daemon after the profile is created?

  1. Can we somehow restart the broker automatically? I think we have something similar implemented for the daemon in case the process ID is lost?

This is done by circus. It should restart it in the current branch. It takes some time like 5 seconds or so. I don't think I used a different timing than we use for the workers

  1. The feedback to the user should probably improve in case the daemon is somehow running without a ZMQ broker.

But isn't this shown already in verdi daemon status maybe just not that visible

~/c/a/improve-zmq-v5 (zmqbroker-v5)> verdi daemon status                                                                                                              (aiida-core) 
Profile: presto-12
Daemon is running as PID 69337 since 2026-04-13 21:33:52
Broker is running as PID 69485 [0 pending, 0 processing]
Broker directory: /Users/alexgo/code/aiida-core/.pixi/envs/default/etc/.aiida/broker/bf1ec1ffdcae45f7a6fe95b2f87b42a4
[...]

and

~/c/a/improve-zmq-v5 (zmqbroker-v5)> verdi daemon status                                                                                                              (aiida-core) 
Profile: presto-12
Daemon is running as PID 69337 since 2026-04-13 21:33:52
Broker is NOT running
[..]

We can maybe improve the UI of verdi dameon status after this is merged


  • Improve verdi status to show mismatch of current version, solved in PR ✨ Add daemon version drift warning #7318
  • Improve performance by switching to JSON encoding for the messages
  • @mbercx Should I remove the autostart daemon on profile creation? It is different behavior then before and starting a service in the background on profile creating can seem a bit sneaky.
  • It seem looks like we do not stop the daemon when a profile is created, thereby the broker is also not shutdown. This is a general bug I would say that will be solved in a separate PR 🐛 Stop daemon before verdi profile delete #7319
  • At the moment, presto defaults to ZMQ, you need to use --use-rabbitmq. That is breaking change. We could change it to try to use rabbitmq and if it is not there it defaults back to ZMQ. That would be not breaking and also okay for new users.

agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Implement a ZMQ broker as an alternative to RabbitMQ for AiiDA's process
control. The broker uses ZeroMQ sockets with a persistent file-based
queue and requires no external services.

New modules in `src/aiida/brokers/zmq/`:
- `broker.py`: ZmqBroker class implementing the Broker interface
- `communicator.py`: ZmqCommunicator for process control RPCs
- `protocol.py`: Wire protocol for ZMQ messages
- `queue.py`: Persistent task queue with file-based storage
- `server.py`: ZMQ broker server handling task routing
- `service.py`: Service wrapper for running the broker process
- `defaults.py`: Default configuration

Register `core.zmq` entry point and add `zmq` alias in Manager.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Introduce `requires_broker` marker for tests that need any message
broker (RabbitMQ or ZMQ), distinct from `requires_rmq` which needs
RabbitMQ specifically. Add `--broker-backend` pytest CLI option to
select broker backend (rmq, zmq, none) for test runs.

Add ZMQ broker start/stop helpers in conftest.py and update the
`aiida_profile` fixture to auto-start the ZMQ broker when selected.
Rename markers from `requires_rmq` to `requires_broker` across test
files where tests work with any broker backend.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Wire the ZMQ broker into the daemon lifecycle:
- Add `verdi daemon broker` hidden command for circus to manage
- Add ZMQ broker as a circus watcher started before workers
- Show ZMQ broker status in `verdi daemon status` and `verdi status`

Fix SQLite stale-read issue by committing the session after RPC calls
in `control.py`.

Update CI to run test matrix with both `rmq` and `zmq` broker backends.
Remove RabbitMQ service dependency from minimum-requirements and presto
test jobs.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Change `verdi presto` to always configure a broker: it tries RabbitMQ
first and falls back to ZMQ if unavailable. Add `--use-zmq` flag to
skip RabbitMQ detection and use ZMQ directly.

Previously, profiles created without RabbitMQ had no broker at all,
limiting functionality. Now every `verdi presto` profile gets a working
broker out of the box.

Regenerate the command-line reference during the rebase so the tracked
autodocs stay in sync with the updated CLI.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Add `--broker` option to `verdi profile setup` accepting 'rabbitmq',
'zmq', or 'none'. Deprecate the `--use-rabbitmq/--no-use-rabbitmq`
flag with a warning pointing to the new option.

Add daemon restart logic to `verdi profile configure-rabbitmq` so
broker reconfiguration takes effect immediately.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
`get_daemon_client(profile.name)` called `load_profile()` which switched
the global manager profile, corrupting the state for all subsequent
operations in the same process. Use `DaemonClient(profile)` directly
and skip the daemon check entirely when no broker is configured yet.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 24, 2026
Allow creating profiles without any message broker configured. This is
useful for profiles used only for data exploration and querying, where
the daemon and process submission are not needed.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 27, 2026
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 27, 2026
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 27, 2026
…7284)

Lower default timeout from 30s to 10s and log a warning after 5s
so users get feedback when the broker is taking a long time to start.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 27, 2026
)

When the broker is killed with SIGKILL, the temp socket directory
becomes orphaned. On next startup, clean up the old directory before
creating a new one.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 27, 2026
…am#7284)

Change `is_running()` method to a `@property` to match the API of
`ZmqBrokerServer.is_running`, making state queries more Pythonic.
Update all call sites to use the property syntax.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
Add `--broker` option to `verdi profile setup` accepting 'rabbitmq',
'zmq', or 'none'. Deprecate the `--use-rabbitmq/--no-use-rabbitmq`
flag with a warning pointing to the new option.

Add daemon restart logic to `verdi profile configure-rabbitmq` so
broker reconfiguration takes effect immediately.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
`get_daemon_client(profile.name)` called `load_profile()` which switched
the global manager profile, corrupting the state for all subsequent
operations in the same process. Use `DaemonClient(profile)` directly
and skip the daemon check entirely when no broker is configured yet.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
Allow creating profiles without any message broker configured. This is
useful for profiles used only for data exploration and querying, where
the daemon and process submission are not needed.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
Document the process/thread model, socket architecture, message flow,
dead peer detection, persistent queue, service files, message types,
the AMQP-to-ZMQ mapping, write format and all timeout constants with
diagrams.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 28, 2026
…eam#7284)

Replace `rmq.task_timeout` with a broker-agnostic `broker.task_timeout`
option used by both ZMQ and RabbitMQ backends. The old option is kept
with a `deprecated_by` marker on the Field, enabling generic deprecation
handling in `Manager.get_option` and `verdi config set`.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Implement a ZMQ broker as an alternative to RabbitMQ for AiiDA's process
control. The broker uses ZeroMQ sockets with a persistent file-based
queue and requires no external services.

New modules in `src/aiida/brokers/zmq/`:
- `broker.py`: ZmqBroker class implementing the Broker interface
- `communicator.py`: ZmqCommunicator for process control RPCs
- `protocol.py`: Wire protocol for ZMQ messages
- `queue.py`: Persistent task queue with file-based storage
- `server.py`: ZMQ broker server handling task routing
- `service.py`: Service wrapper for running the broker process
- `defaults.py`: Default configuration

Register `core.zmq` entry point and add `zmq` alias in Manager.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Introduce `requires_broker` marker for tests that need any message
broker (RabbitMQ or ZMQ), distinct from `requires_rmq` which needs
RabbitMQ specifically. Add `--broker-backend` pytest CLI option to
select broker backend (rmq, zmq, none) for test runs.

Add ZMQ broker start/stop helpers in conftest.py and update the
`aiida_profile` fixture to auto-start the ZMQ broker when selected.
Rename markers from `requires_rmq` to `requires_broker` across test
files where tests work with any broker backend.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Wire the ZMQ broker into the daemon lifecycle:
- Add `verdi daemon broker` hidden command for circus to manage
- Add ZMQ broker as a circus watcher started before workers
- Show ZMQ broker status in `verdi daemon status` and `verdi status`

Fix SQLite stale-read issue by committing the session after RPC calls
in `control.py`.

Update CI to run test matrix with both `rmq` and `zmq` broker backends.
Remove RabbitMQ service dependency from minimum-requirements and presto
test jobs.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Change `verdi presto` to always configure a broker: it tries RabbitMQ
first and falls back to ZMQ if unavailable. Add `--use-zmq` flag to
skip RabbitMQ detection and use ZMQ directly.

Previously, profiles created without RabbitMQ had no broker at all,
limiting functionality. Now every `verdi presto` profile gets a working
broker out of the box.

Regenerate the command-line reference during the rebase so the tracked
autodocs stay in sync with the updated CLI.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Add `--broker` option to `verdi profile setup` accepting 'rabbitmq',
'zmq', or 'none'. Deprecate the `--use-rabbitmq/--no-use-rabbitmq`
flag with a warning pointing to the new option.

Add daemon restart logic to `verdi profile configure-rabbitmq` so
broker reconfiguration takes effect immediately.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
`get_daemon_client(profile.name)` called `load_profile()` which switched
the global manager profile, corrupting the state for all subsequent
operations in the same process. Use `DaemonClient(profile)` directly
and skip the daemon check entirely when no broker is configured yet.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Allow creating profiles without any message broker configured. This is
useful for profiles used only for data exploration and querying, where
the daemon and process submission are not needed.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
Document the process/thread model, socket architecture, message flow,
dead peer detection, persistent queue, service files, message types,
the AMQP-to-ZMQ mapping, write format and all timeout constants with
diagrams.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Apr 29, 2026
…eam#7284)

Replace `rmq.task_timeout` with a broker-agnostic `broker.task_timeout`
option used by both ZMQ and RabbitMQ backends. The old option is kept
with a `deprecated_by` marker on the Field, enabling generic deprecation
handling in `Manager.get_option` and `verdi config set`.
Implement a ZMQ broker as an alternative to RabbitMQ for AiiDA's process
control. The broker uses ZeroMQ sockets with a persistent file-based
queue and requires no external services.

New modules in `src/aiida/brokers/zmq/`:
- `broker.py`: ZmqBroker class implementing the Broker interface
- `communicator.py`: ZmqCommunicator for process control RPCs
- `protocol.py`: Wire protocol for ZMQ messages
- `queue.py`: Persistent task queue with file-based storage
- `server.py`: ZMQ broker server handling task routing
- `service.py`: Service wrapper for running the broker process
- `defaults.py`: Default configuration

Register `core.zmq` entry point and add `zmq` alias in Manager.
Introduce `requires_broker` marker for tests that need any message
broker (RabbitMQ or ZMQ), distinct from `requires_rmq` which needs
RabbitMQ specifically. Add `--broker-backend` pytest CLI option to
select broker backend (rmq, zmq, none) for test runs.

Add ZMQ broker start/stop helpers in conftest.py and update the
`aiida_profile` fixture to auto-start the ZMQ broker when selected.
Rename markers from `requires_rmq` to `requires_broker` across test
files where tests work with any broker backend.
Wire the ZMQ broker into the daemon lifecycle:
- Add `verdi daemon broker` hidden command for circus to manage
- Add ZMQ broker as a circus watcher started before workers
- Show ZMQ broker status in `verdi daemon status` and `verdi status`

Fix SQLite stale-read issue by committing the session after RPC calls
in `control.py`.

Update CI to run test matrix with both `rmq` and `zmq` broker backends.
Remove RabbitMQ service dependency from minimum-requirements and presto
test jobs.
Change `verdi presto` to always configure a broker: it tries RabbitMQ
first and falls back to ZMQ if unavailable. Add `--use-zmq` flag to
skip RabbitMQ detection and use ZMQ directly.

Previously, profiles created without RabbitMQ had no broker at all,
limiting functionality. Now every `verdi presto` profile gets a working
broker out of the box.

Regenerate the command-line reference during the rebase so the tracked
autodocs stay in sync with the updated CLI.
Add `--broker` option to `verdi profile setup` accepting 'rabbitmq',
'zmq', or 'none'. Deprecate the `--use-rabbitmq/--no-use-rabbitmq`
flag with a warning pointing to the new option.

Add daemon restart logic to `verdi profile configure-rabbitmq` so
broker reconfiguration takes effect immediately.
`get_daemon_client(profile.name)` called `load_profile()` which switched
the global manager profile, corrupting the state for all subsequent
operations in the same process. Use `DaemonClient(profile)` directly
and skip the daemon check entirely when no broker is configured yet.
Allow creating profiles without any message broker configured. This is
useful for profiles used only for data exploration and querying, where
the daemon and process submission are not needed.
Document the process/thread model, socket architecture, message flow,
dead peer detection, persistent queue, service files, message types,
the AMQP-to-ZMQ mapping, write format and all timeout constants with
diagrams.
…eam#7284)

Replace `rmq.task_timeout` with a broker-agnostic `broker.task_timeout`
option used by both ZMQ and RabbitMQ backends. The old option is kept
with a `deprecated_by` marker on the Field, enabling generic deprecation
handling in `Manager.get_option` and `verdi config set`.
@agoscinski

agoscinski commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

Tests that fail are flaky and their fixes are these PRs or opened issues
py3.14:

It is nevertheless suspicious that all flaky test failure show in the ZMQ CI. I will merge and see if this behavior continuous to be seen in subsequent PRs. Maybe this is because ZMQ is spawned with the daemon and thus more resource hungry, reaching easier such edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants