Fix missing move of query_metadata_cache in BlockIO::operator= by alexey-milovidov · Pull Request #96995 · ClickHouse/ClickHouse

alexey-milovidov · 2026-02-15T12:16:48Z

Summary

BlockIO::operator=(BlockIO&&) was not moving query_metadata_cache, causing the cache to be destroyed prematurely on every query through both TCP and HTTP paths.

Closes #95742

Root Cause Analysis

Bug #1: `BlockIO::operator=` does NOT move `query_metadata_cache`

In src/QueryPipeline/BlockIO.cpp:26-44:

BlockIO & BlockIO::operator= (BlockIO && rhs) noexcept
{
    reset();
    process_list_entries    = std::move(rhs.process_list_entries);
    pipeline                = std::move(rhs.pipeline);
    finalize_query_pipeline = std::move(rhs.finalize_query_pipeline);
    finish_callbacks        = std::move(rhs.finish_callbacks);
    exception_callbacks     = std::move(rhs.exception_callbacks);
    null_format             = rhs.null_format;
    // query_metadata_cache is NOT moved!
    return *this;
}

This is triggered on every query via both execution paths:

TCP path (executeQuery.cpp:1970): res = executeQueryImpl(...)
HTTP path (executeQuery.cpp:2168): streams = executeQueryImpl(...)

The consequence: executeQueryImpl packs the cache into its returned BlockIO, but operator= strips it out. The temp BlockIO is destroyed immediately, destroying the cache while the pipeline lives on in res.

What happens when the cache is destroyed prematurely

For mutation validation queries (ALTER TABLE ... UPDATE):

MutationsInterpreter::validate() builds a validation pipeline internally, which caches a StorageSnapshotPtr via getStorageSnapshot in the QueryMetadataCache
validate() returns — its internal pipeline is destroyed, releasing its StorageSnapshotPtr
The cache entry is now the ONLY remaining StorageSnapshotPtr (refcount = 1)
InterpreterAlterQuery::execute() returns an empty BlockIO (no pipeline)
executeQueryImpl packs the cache into the returned BlockIO
res = executeQueryImpl(...) → operator= moves the pipeline (empty) but NOT the cache
The temp BlockIO is destroyed → cache destroyed → last StorageSnapshotPtr released
~StorageSnapshot → ~SnapshotData:
- parts released → parts freed → clearCaches() → accesses storage

Why `SnapshotData::storage` doesn't always save us

Within SnapshotData destruction, parts (line 623) is freed before storage (line 619). So normally the storage IS alive when clearCaches runs.

But there's a complicating factor: MergeTreeData::shared_ranges_in_parts (line 1496) may hold the same RangesInDataPartsPtr. When both SnapshotData::parts and shared_ranges_in_parts share the same shared_ptr, the destruction chain becomes:

SnapshotData::parts released → refcount 2→1 (not freed, shared_ranges_in_parts still holds it)
SnapshotData::storage released → if table was DETACHED, this is the last ref → MergeTreeData destructor runs
During ~MergeTreeData(), shared_ranges_in_parts (line 1496) is destroyed → parts freed → clearCaches() → accesses this (the MergeTreeData being destroyed)

At this point we're inside MergeTreeData's destructor. The StorageMergeTree/StorageReplicatedMergeTree destructor has already run (derived class destructors run first). If shutdown() invalidated any state that clearCaches depends on (like context caches, virtual table pointers after derived destructor), this could SEGFAULT.

Bug #2: `InterpreterOptimizeQuery` mutably modifies cached snapshot

In src/Interpreters/InterpreterOptimizeQuery.cpp:81-82:

if (auto * snapshot_data = dynamic_cast<MergeTreeData::SnapshotData *>(storage_snapshot->data.get()))
    snapshot_data->parts = {};

This clears parts on a cached StorageSnapshot via a mutable cast, while storage remains set. This is a data race on shared state and could cause parts to be freed unexpectedly.

Historical context

Azat Khuzhin attempted a different fix in Jan 2026:

1002f7ce907 — Removed SnapshotData::storage, added BlockIO::resetPipeline() to destroy cache before pipeline
d02700909db — Reordered BlockIO fields to match
8c44f5e9374 — Reverted both, putting SnapshotData::storage back

The revert restored the status quo, but the BlockIO::operator= bug (which predates all this) was never fixed.

Changelog category:

Critical Bug Fix (crash, data loss, RBAC) or LOGICAL_ERROR

Changelog entry:

Fix crash (SEGFAULT) in clearCaches caused by BlockIO::operator= not moving query_metadata_cache, leading to premature destruction of cached storage snapshots and use-after-free of MergeTreeData storage.

🤖 Generated with Claude Code

Closes #95742 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh · 2026-02-15T12:17:21Z

Workflow [PR], commit [5e70745]

…cache Concurrent mutations, selects, and detach/attach operations exercise the race where the cache is destroyed prematurely while the storage is being removed from the database. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexey-milovidov · 2026-02-15T12:21:36Z

tests/queries/0_stateless/03917_blockio_move_query_metadata_cache.sh

+        # DETACH removes the storage from the database, dropping its StoragePtr.
+        # If the cache snapshot holds the last ref, destroying the cache will
+        # free the storage while parts still exist in shared_ranges_in_parts.
+        $CLICKHOUSE_CLIENT --query "DETACH TABLE ${TABLE}" 2>/dev/null


Another proof for #96130

alexey-milovidov · 2026-02-15T12:28:13Z

The bug was introduced here: #92118, CC @amosbird
It hit only due to the existence of this optimization: #85535, CC @jawm

alexey-milovidov · 2026-02-15T12:30:16Z

Related PRs and Issues

Previous fix attempts (all reverted)

PR	Author	Title	State
#95074	alexey-milovidov	Fix segfault in `clearCaches` when table is dropped during query	Merged, then reverted
#95120	alexey-milovidov	Revert "Fix segfault in `clearCaches` when table is dropped during query"	Merged
#95393	azat	Fix order of destruction in `MergeTreeReadPoolBase`	Merged (survived)
#95396	azat	Do not hold a storage in storage snapshot	Merged, then reverted
#95594	azat	Revert "Do not hold a storage in storage snapshot"	Merged

Origin of the bug

PR	Author	Title
#92118	Amos Bird	Proper ownership of query metadata cache (introduced `query_metadata_cache` in `BlockIO` but `operator=` was not updated)
#85535	James Morrison	Optimize selection of part list during query planning (introduced `shared_ranges_in_parts` / `shared_parts_list` — a contributing factor)

Issues

Issue	Title	State
#95742	Flaky test: 01076_parallel_alter_replicated_zookeeper	Open
#94661	[CI crash] Incorrect destruction of MergeTreeDataPartCompact caches	Closed
#91989	SIGSEGV in `IMergeTreeDataPart::clearCaches` during `executeReplaceRange`	Closed

…opTablesParallel` path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexey-milovidov · 2026-02-15T15:12:12Z

MSan stress test report — same root cause confirmed

Report URL: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=e4d1f04e6bd203d4ccb73f35eac78620ea7cfabb&name_0=MasterCI&name_1=Stress%20test%20%28azure%2C%20amd_msan%29&name_1=Stress%20test%20%28azure%2C%20amd_msan%29

Analysis

The MSan trace from the Stress test (azure, amd_msan) job shows the same root cause as the original SIGSEGV in clearCaches, but manifests differently due to MSan catching the use-after-free earlier.

Destruction path shown in the MSan trace:

DatabaseCatalog::dropTablesParallel frees the storage on a background thread
Meanwhile, on the TCPHandler thread, the pipeline is being destroyed via BlockIO::onException → pipeline.reset()
The pool's parts still reference the freed storage memory via the bare IMergeTreeDataPart::storage reference (const MergeTreeData &)
When clearCaches is called during part destruction, it accesses storage.getPrimaryIndexCache() — hitting freed memory

Why our fix addresses this:

The BlockIO::operator=(BlockIO&&) bug causes the query_metadata_cache (which holds StorageSnapshotPtr entries keeping the storage alive) to be destroyed prematurely when the temporary BlockIO from executeQueryImpl is assigned to the caller's BlockIO variable. Without this cache, the SnapshotData::storage (ConstStoragePtr) — the last strong reference keeping the storage alive — gets freed. This allows DatabaseCatalog::dropTablesParallel to finalize storage destruction on its background thread while the TCPHandler thread's pipeline still holds parts referencing the storage.

By moving query_metadata_cache in BlockIO::operator=, the cache survives for the lifetime of the pipeline, ensuring SnapshotData::storage keeps the storage alive until all parts are properly destroyed.

The `select_thread` function outputs `count()` results to stdout, polluting the test output with `0` values that don't match the reference file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix missing move of query_metadata_cache in BlockIO::operator=

c5dd938

Closes #95742 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

clickhouse-gh bot added pr-critical-bugfix pr-must-backport Pull request should be backported intentionally. Use this label with great care! labels Feb 15, 2026

alexey-milovidov requested a review from azat February 15, 2026 12:17

alexey-milovidov commented Feb 15, 2026

View reviewed changes

alexey-milovidov requested a review from amosbird February 15, 2026 12:53

Add DROP TABLE / CREATE TABLE thread to exercise `DatabaseCatalog::dr…

9219574

…opTablesParallel` path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

alexey-milovidov and others added 2 commits February 15, 2026 17:14

Fix test by redirecting stdout of select thread to /dev/null

9c12ea1

The `select_thread` function outputs `count()` results to stdout, polluting the test output with `0` values that don't match the reference file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge master into fix-blockio-move-query-metadata-cache

5e70745

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing move of query_metadata_cache in BlockIO::operator=#96995

Fix missing move of query_metadata_cache in BlockIO::operator=#96995
alexey-milovidov wants to merge 5 commits intomasterfrom
fix-blockio-move-query-metadata-cache

alexey-milovidov commented Feb 15, 2026

Uh oh!

clickhouse-gh bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

alexey-milovidov Feb 15, 2026

Uh oh!

alexey-milovidov commented Feb 15, 2026

Uh oh!

alexey-milovidov commented Feb 15, 2026

Uh oh!

alexey-milovidov commented Feb 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alexey-milovidov commented Feb 15, 2026

Summary

Root Cause Analysis

Bug #1: BlockIO::operator= does NOT move query_metadata_cache

What happens when the cache is destroyed prematurely

Why SnapshotData::storage doesn't always save us

Bug #2: InterpreterOptimizeQuery mutably modifies cached snapshot

Historical context

Changelog category:

Changelog entry:

Uh oh!

clickhouse-gh bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexey-milovidov Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Feb 15, 2026

Uh oh!

alexey-milovidov commented Feb 15, 2026

Related PRs and Issues

Previous fix attempts (all reverted)

Origin of the bug

Issues

Uh oh!

alexey-milovidov commented Feb 15, 2026

MSan stress test report — same root cause confirmed

Analysis

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bug #1: `BlockIO::operator=` does NOT move `query_metadata_cache`

Why `SnapshotData::storage` doesn't always save us

Bug #2: `InterpreterOptimizeQuery` mutably modifies cached snapshot

clickhouse-gh bot commented Feb 15, 2026 •

edited

Loading