Attempt to fix flaky util-app test in GitHub CI #310

felixbr · 2022-08-02T10:40:22Z

As you can see in many PRs (e.g. #308) there is often one of the different build matrix cases red. As far as I can see it's always the same flaky test case in AppTest (in util-app).

I tried many things but couldn't reproduce the timeout locally. As an attempt to fight the failures in CI, I'm increasing the timeout for the grace period and also add retries to the test case.

I understand that this is mostly a band aid and not a great solution, so if you don't like it please let me know how we should tackle this instead.

Cheers
~ Felix

bryce-anderson · 2022-08-02T15:53:54Z

Does just increasing the defaultCloseGracePeriod to 30 seconds like you've done do the trick? It seems like a real bug if we cannot ever get the right exit behavior regardless of the grace period.

felixbr · 2022-08-05T09:40:41Z

Well, locally it always works, even with a very very low grace period. My suspicion is that a different timeout (e.g. from Await.ready) somewhere is firing the TimeoutException and so increasing the grace period is not really helping to "fix" CI.

That is purely speculation, of course, as I couldn't reproduce it even once locally.

I've seen similar problems at work where tests would basically be starved in CI because sbt tries to parallelize heavily but the machine has very little CPU and so futures in tests would be created but then paused for very long periods, leading to timeouts.

If you want me to, I can remove the retry part but eventually will only retry for a while and then fail the test anyway. So if it was really broken it wouldn't change anything for the outcome.

bryce-anderson · 2022-08-11T15:26:06Z

If you want me to, I can remove the retry part but eventually will only retry for a while and then fail the test anyway. So if it was really broken it wouldn't change anything for the outcome.

I would prefer to try it. If it's not a problem simply due to "not enough CPU time", which I think you're right that it is, and it's actually flaky for other reasons that is broken and should be fixed and retrying will mask the flakiness/brokenness.

felixbr · 2023-05-18T15:28:36Z

I've rebased this onto develop and reverted the addition of eventually retries. So it's only an increased timeout and a comment now.

felixbr · 2023-05-18T15:28:39Z

I've rebased this onto develop and reverted the addition of eventually retries. So it's only an increased timeout and a comment now.

util-app: Attempt to fix flaky test in GitHub CI

344ac48

felixbr force-pushed the util-app/fix-flaky-test branch from fe57d42 to 344ac48 Compare May 18, 2023 15:10

felixbr mentioned this pull request May 18, 2023

build: Add Scala 3 to CI matrix #308

Open

felixbr mentioned this pull request May 18, 2023

support scala 3 build for util-tunable #305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt to fix flaky util-app test in GitHub CI #310

Attempt to fix flaky util-app test in GitHub CI #310

Attempt to fix flaky util-app test in GitHub CI #310

Are you sure you want to change the base?

Attempt to fix flaky util-app test in GitHub CI #310

Conversation