Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flake TestRevision #17477

Open
tjungblu opened this issue Feb 22, 2024 · 2 comments
Open

Flake TestRevision #17477

tjungblu opened this issue Feb 22, 2024 · 2 comments

Comments

@tjungblu
Copy link
Contributor

Which github workflows are flaking?

unit tests

Which tests are flaking?

go.etcd.io/etcd/server/v3/etcdserver/api/v3compactor: TestRevision

Github Action link

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/17469/pull-etcd-unit-test/1760242318838861824

Reason for failure (if possible)

{Failed  === RUN   TestRevision
    logger.go:130: 2024-02-21T09:59:44.200Z	INFO	starting auto revision compaction	{"revision": 90, "revision-compaction-retention": 10}
    revision_test.go:48: len(actions) = 0, expected >= 1
--- FAIL: TestRevision (0.05s)
}

Anything else we need to know?

No response

@ybaldus
Copy link

ybaldus commented Feb 22, 2024

Hi,

from my understanding this comes from the wait timeout here:

rg := &fakeRevGetter{testutil.NewRecorderStreamWithWaitTimout(10 * time.Millisecond), 0}

How do you normally handle such a case?

@Fube
Copy link
Contributor

Fube commented Apr 13, 2024

Following #17054 (comment), adding a time.Sleep(11 * time.Millisecond) on

will allow you to reproduce the issue


I could be wrong, but it seems every rg.Wait(1) in the test if effectively just a time.Sleep(10 * time.Millisecond) + recorder stream channel flush

Unfortunately, the fix described in #17513 is not enough as "do not timeout" causes the very first rg.Wait(1) to wait forever as the fc.Advance(revInterval) that precedes it advances time before the Revision.Run loop has started (you can check this by sleeping before advancing time)

There is also this rg.Wait(1) which seems to serve no purpose


I believe to fix this, we would need to know when the Revision.Run loop is ready
A hacky way to implement this could be:

func newFakerClock() *fakerClock {
	return &fakerClock{
		FakeClock:    clockwork.NewFakeClock(),
		afterRequest: make(chan struct{}),
	}
}

func (frc *fakerClock) After(d time.Duration) <-chan time.Time {
	select {
	case frc.afterRequest <- struct{}{}:
	default:
	}
	return frc.FakeClock.After(d)
}

func TestRevision(t *testing.T) {
	fc := newFakerClock() // <- this changed
	rg := &fakeRevGetter{testutil.NewRecorderStreamWithWaitTimout(math.MaxInt64), 0}  // <- this changed
	compactable := &fakeCompactable{testutil.NewRecorderStreamWithWaitTimout(10 * time.Millisecond)}
	tb := newRevision(zaptest.NewLogger(t), fc, 10, rg, compactable)

	tb.Run()
	defer tb.Stop()

	<-fc.afterRequest  // <- this is new
	fc.Advance(revInterval)
	rg.Wait(1)

// ...

though I do not know if that is an acceptable solution.
Thoughts @ahrtr ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants