Skip to content

subctl: Enhance diagnose to troubleshoot state of submariner #961

@vthapar

Description

@vthapar

What would you like to be added:
Enhance subctl diagnose to do more analysis than it currently does. It currently focuses on finding out if something has gone wrong, not why it has gone wrong. Some enhancements that can be done are:

  1. For OVN-CI, make sure legacy ports etc. are not present.
  2. Make sure OVN flows, router policies etc. are using correct IPs as per endpoints.
  3. Check of IP Tables rules programed are using correct IPs.
  4. For Globalnet, make sure exported services are using same IPs as GlobalIngressIPs allocated to them.
  5. Check the logs for frequency of logs. Too frequent logs can cause log overflow in long running setups, losing crucial information. This shold help catch any overzealous logs.
  6. Check if pod logs are about to runover, so user can back them up for future troubleshooting. Note: This should probably be an alert.
  7. Check if any multicluster objects match in contents on source, broker and destination.

Why is this needed:
Currently subctl diagnose only does basic diagnosis. Checks for deployments and pods states, run firewall test etc. But lot of troubleshooting still requires dev team to gather logs and analyze them. Some of the analysis done manually can be easily automated. Aim is to minimize effort and time dev team has to spend on troubleshooting.

Metadata

Metadata

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions