Skip to content

Fix transient minio startup timeout in CI#96993

Merged
alexey-milovidov merged 1 commit intomasterfrom
fix-minio-startup-timeout
Feb 15, 2026
Merged

Fix transient minio startup timeout in CI#96993
alexey-milovidov merged 1 commit intomasterfrom
fix-minio-startup-timeout

Conversation

@alexey-milovidov
Copy link
Member

Summary

Fix a transient CI failure where minio infrastructure setup times out before tests can run, observed in "Stateless tests (amd_debug, AsyncInsert, s3 storage, sequential)".

  • Remove unnecessary sleep 5 in setup_minio.sh after wait_for_it already confirmed minio is responsive
  • Increase the readiness polling retry count from 20 to 60 in clickhouse_proc.py

Observations from the failure

CI report

The job failed at Start ClickHouse Server with Failed to start minio. Zero tests executed.

setup_minio.sh is launched asynchronously and the Python poller immediately starts checking mc ls clickminio/test | grep -q . every ~2s (1s sleep + ~1s for the failing command). With 20 retries this gives a ~40s total budget.

Timeline from the job log:

  • 09:42:45setup_minio.sh started, polling begins
  • 09:42:45 – 09:43:10 (~25s, 14 retries) — mc errors with Requested path .../clickminio not found — the mc alias hasn't been configured yet, meaning minio server startup + the sleep 5 consumed all this time
  • 09:43:12 – 09:43:23 (~11s, 6 retries) — error changes to Bucket test does not exist — alias is now set but mc mb clickminio/test hasn't completed
  • 09:43:24 — all 20 retries exhausted, Failed to start minio

The setup_minio.sh execution sequence is: start minio server → wait_for_it (up to 60s) → lsofsleep 5mc alias setmc mbmc cp data. The sleep 5 after wait_for_it already confirmed responsiveness is wasteful, and the 20-retry budget (~40s) cannot accommodate even the wait_for_it worst case (60s).

Test plan

  • Verified no other callers of setup_minio.sh or the polling logic
  • CI passes with the fix (minio-dependent s3 storage test jobs succeed)

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

🤖 Generated with Claude Code

The `setup_minio.sh` script is started asynchronously, and the Python
poller checks `mc ls clickminio/test` to detect readiness. With only 20
retries (~40s total including command execution time), the timeout is
insufficient when minio is slow to start.

The shell script's `start_minio` function had an unnecessary `sleep 5`
after `wait_for_it` already confirmed the server is responsive, wasting
5 seconds. The 20-retry limit in the Python poller was too tight given
that `wait_for_it` alone can take up to 60 seconds.

Changes:
- Remove unnecessary `sleep 5` from `start_minio` in `setup_minio.sh`
- Increase the polling retry count from 20 to 60 in `clickhouse_proc.py`

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=96987&sha=eb763d8476e10cd6b9c1161babaf23376002dbe8&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20AsyncInsert%2C%20s3%20storage%2C%20sequential%29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Feb 15, 2026

Workflow [PR], commit [4432216]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-ci label Feb 15, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Feb 15, 2026
@alexey-milovidov alexey-milovidov self-assigned this Feb 15, 2026
Merged via the queue into master with commit 513a3ff Feb 15, 2026
134 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-minio-startup-timeout branch February 15, 2026 15:16
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants