Description
What did you do?
In certain cases, it is possible for two (or more) different aggregation groups to share an nflog. The conditions for this to happen are the following:
- There must be at least two routes with the same matchers and group labels (
group_by
). - At least one of the routes must have
continue: true
. - Both routes must use the same receivers.
For example, take the following configuration:
receivers:
- name: test1
webhook_configs:
- url: http://127.0.0.1:8080/test1
- name: test2
webhook_configs:
- url: http://127.0.0.1:8080/test2
route:
receiver: test1
group_wait: 15s
group_interval: 30s
routes:
- continue: true
receiver: test1
matchers:
- foo=bar
- continue: true
receiver: test1
matchers:
- foo=bar
This configuration meets all of the requirements to share an nflog.
When an Alertmanager is run with this configuration, you can see from the flushing alerts
and Notify success
lines that two aggregation groups are created and then flushed:
ts=2024-04-15T08:33:35.749Z caller=dispatch.go:164 level=debug component=dispatcher msg="Received alert" alert=[3fff2c2][active]
ts=2024-04-15T08:33:50.750Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}/{foo=\"bar\"}:{}" msg=flushing alerts=[[3fff2c2][active]]
ts=2024-04-15T08:33:50.749Z caller=dispatch.go:516 level=debug component=dispatcher aggrGroup="{}/{foo=\"bar\"}:{}" msg=flushing alerts=[[3fff2c2][active]]
ts=2024-04-15T08:33:50.753Z caller=notify.go:863 level=debug component=dispatcher receiver=test1 integration=webhook[0] aggrGroup="{}/{foo=\"bar\"}:{}" alerts=[[3fff2c2][active]] msg="Notify success" attempts=1 duration=2.743458ms
ts=2024-04-15T08:33:50.753Z caller=notify.go:863 level=debug component=dispatcher receiver=test1 integration=webhook[0] aggrGroup="{}/{foo=\"bar\"}:{}" alerts=[[3fff2c2][active]] msg="Notify success" attempts=1 duration=2.85275ms
However, reading the nflog file (after shutdown) shows just one nflog entry on disk:
N
>
{}/{foo="bar"}:{}
test1webhook*
����˗�2 ����ދ��
���˗�%
However, when each route uses a different receiver then there are two entries on disk:
N
>
{}/{foo="bar"}:{}
test1webhook*
������2 ����ދ��
�����N
>
{}/{foo="bar"}:{}
test2webhook*
���в�2 ����ދ��
��в�%
I expect this to create even more issues if the routes have different timers (i.e. group_wait
, group_interval
and repeat_interval
) or active and mute time intervals:
route:
receiver: test1
group_wait: 15s
group_interval: 30s
routes:
- continue: true
receiver: test1
matchers:
- foo=bar
- continue: true
receiver: test1
matchers:
- foo=bar
group_wait: 30s
group_interval: 5m
mute_time_intervals:
- evenings
What did you expect to see?
I expected to see a separate entry for each aggregation group. Here is the nflog from the first example, but instead running the code in this branch:
P
@
{}/{foo="bar"}/1:{}
test1webhook*
���ػ��2 ����ދ��
Ԯ�ػ��P
@
{}/{foo="bar"}/0:{}
test1webhook*
������2 ����ދ��
Ԯ����%