Flake TestRevision #17477

tjungblu · 2024-02-22T12:33:41Z

Which github workflows are flaking?

unit tests

Which tests are flaking?

go.etcd.io/etcd/server/v3/etcdserver/api/v3compactor: TestRevision

Github Action link

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17469/pull-etcd-unit-test/1760242318838861824

Reason for failure (if possible)

{Failed  === RUN   TestRevision
    logger.go:130: 2024-02-21T09:59:44.200Z	INFO	starting auto revision compaction	{"revision": 90, "revision-compaction-retention": 10}
    revision_test.go:48: len(actions) = 0, expected >= 1
--- FAIL: TestRevision (0.05s)
}

Anything else we need to know?

No response

The text was updated successfully, but these errors were encountered:

ybaldus · 2024-02-22T20:19:01Z

Hi,

from my understanding this comes from the wait timeout here:

etcd/server/etcdserver/api/v3compactor/revision_test.go

Line 31 in 47e9a16

 rg := &fakeRevGetter{testutil.NewRecorderStreamWithWaitTimout(10 * time.Millisecond), 0} 

How do you normally handle such a case?

Fube · 2024-04-13T22:08:26Z

Following #17054 (comment), adding a time.Sleep(11 * time.Millisecond) on

etcd/server/etcdserver/api/v3compactor/revision.go

Line 79 in 7ded2ac

will allow you to reproduce the issue

I could be wrong, but it seems every rg.Wait(1) in the test if effectively just a time.Sleep(10 * time.Millisecond) + recorder stream channel flush

Unfortunately, the fix described in #17513 is not enough as "do not timeout" causes the very first rg.Wait(1) to wait forever as the fc.Advance(revInterval) that precedes it advances time before the Revision.Run loop has started (you can check this by sleeping before advancing time)

There is also this rg.Wait(1) which seems to serve no purpose

etcd/server/etcdserver/api/v3compactor/revision_test.go

Line 56 in 7ded2ac

rg.Wait(1)

I believe to fix this, we would need to know when the Revision.Run loop is ready
A hacky way to implement this could be:

func newFakerClock() *fakerClock {
	return &fakerClock{
		FakeClock:    clockwork.NewFakeClock(),
		afterRequest: make(chan struct{}),
	}
}

func (frc *fakerClock) After(d time.Duration) <-chan time.Time {
	select {
	case frc.afterRequest <- struct{}{}:
	default:
	}
	return frc.FakeClock.After(d)
}

func TestRevision(t *testing.T) {
	fc := newFakerClock() // <- this changed
	rg := &fakeRevGetter{testutil.NewRecorderStreamWithWaitTimout(math.MaxInt64), 0}  // <- this changed
	compactable := &fakeCompactable{testutil.NewRecorderStreamWithWaitTimout(10 * time.Millisecond)}
	tb := newRevision(zaptest.NewLogger(t), fc, 10, rg, compactable)

	tb.Run()
	defer tb.Stop()

	<-fc.afterRequest  // <- this is new
	fc.Advance(revInterval)
	rg.Wait(1)

// ...

though I do not know if that is an acceptable solution.
Thoughts @ahrtr ?

tjungblu added area/testing type/flake labels Feb 22, 2024

ahrtr added the help wanted label Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flake TestRevision #17477

Flake TestRevision #17477

tjungblu commented Feb 22, 2024

ybaldus commented Feb 22, 2024

Fube commented Apr 13, 2024

Flake TestRevision #17477

Flake TestRevision #17477

Comments

tjungblu commented Feb 22, 2024

Which github workflows are flaking?

Which tests are flaking?

Github Action link

Reason for failure (if possible)

Anything else we need to know?

ybaldus commented Feb 22, 2024

Fube commented Apr 13, 2024