Fix touchWatchedKey() break skipping remaining expired watchers by Yangqing · Pull Request #14752 · redis/redis

Yangqing · 2026-01-29T09:33:23Z

Summary

touchWatchedKey() in multi.c has a break statement that exits the watcher iteration loop early when it encounters a watcher with wk->expired == 1 whose key still exists in the DB (dbFind != NULL). This causes all subsequent watchers in the list to never be visited — they never receive CLIENT_DIRTY_CAS, and their MULTI/EXEC transactions incorrectly succeed instead of being aborted.

The fix: Remove the break and let the code fall through to the existing CLIENT_DIRTY_CAS / unwatchAllKeys() path, which correctly flags the client as dirty.

Root Cause Analysis

The relevant code in touchWatchedKey() handles watchers whose key was already expired at WATCH time (wk->expired == 1):

if (wk->expired) {
    if (db == wk->db &&
        equalStringObjects(key, wk->key) &&
        dbFind(db, key->ptr) == NULL)
    {
        /* Expired key is deleted — no logical change. Clear flag. */
        wk->expired = 0;
        goto skip_client;
    }
    break;  // <-- BUG: exits entire loop, skipping remaining watchers
}

c->flags |= CLIENT_DIRTY_CAS;
unwatchAllKeys(c);

When dbFind != NULL (the expired key has been re-created), the break exits the while loop entirely. Every watcher after the current one in the list is silently skipped.

How to Trigger

On a standalone master, this bug is unreachable. Lazy expiry (expireIfNeeded with EXPIRE_FORCE_DELETE_EXPIRED) always deletes the expired key before the write creates a new value. This produces two touchWatchedKey calls: first with dbFind == NULL (which clears wk->expired flags via the goto skip_client path), then with dbFind != NULL (which notifies watchers normally since expired flags are already cleared). The break is never reached.

The bug is triggered on writable replicas (replica-read-only no) when a key is locally expired on the replica but remains alive on the master:

Master creates a key with no TTL (key never expires on master)
The key is replicated to the writable replica
On the replica, a local PEXPIREAT sets a short expiry (this local change is not replicated back)
The key expires on the replica; multiple clients WATCH it (wk->expired = 1)
Master does SET on the same key — since the key is not expired on the master, no DEL is propagated
The replica receives the replicated SET with CLIENT_MASTER flag
expireIfNeeded() returns KEY_VALID without deleting the expired key (due to CLIENT_MASTER check at db.c:2888)
A single touchWatchedKey() call occurs with dbFind != NULL
The break fires at the first expired watcher, skipping all remaining watchers
Skipped watchers' transactions incorrectly succeed on EXEC

This can also occur through clock skew between master and replica, where a key naturally expires on the replica before the master.

Changes

src/multi.c: Replace break with a comment explaining the fall-through to the CLIENT_DIRTY_CAS path. When a key was expired at WATCH time but has since been re-created, this is a logical state change that should invalidate the transaction.
tests/unit/multi.tcl: Two regression tests using a writable replica setup:
- 3 expired watchers: verifies all three transactions are aborted when the watched key is re-created via replication
- 2 expired watchers: minimal reproduction case

Note

Medium Risk
Changes WATCH/MULTI/EXEC invalidation logic for expired watched keys, which can affect transaction outcomes in replication edge cases. Scope is small but touches core concurrency/replication correctness paths.

Overview
Fixes touchWatchedKey() so encountering an already-expired watcher no longer exits the watcher-iteration early when the key has been re-created; instead it falls through to mark the client CLIENT_DIRTY_CAS and UNWATCH, ensuring all remaining watchers are invalidated.

Adds regression coverage in tests/unit/multi.tcl using a writable replica scenario where a key expires locally but is later re-written via replication, asserting that multiple clients WATCHing the expired key all have their EXEC aborted (no side effects).

^{Written by Cursor Bugbot for commit 0f01552. This will update automatically on new commits. Configure here.}

CLAassistant · 2026-01-29T09:33:37Z

All committers have signed the CLA.

jit-ci · 2026-01-29T09:33:51Z

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

In touchWatchedKey(), when iterating the per-key watcher list and encountering a watcher with wk->expired==1 whose key still exists in the DB (dbFind != NULL), the code executed "break" which exited the entire iteration loop. All subsequent watchers were never visited and never received CLIENT_DIRTY_CAS, causing their MULTI/EXEC transactions to incorrectly succeed. The bug is triggered on writable replicas where a key can be locally expired (via PEXPIREAT) while remaining alive on the master. When the master later writes to the same key, the replicated command arrives without a preceding DEL (since the key was not expired on the master). The replica processes it with CLIENT_MASTER flag, which causes expireIfNeeded() to return KEY_VALID without deleting the expired key. This results in a single touchWatchedKey() call with dbFind != NULL, triggering the break. On standalone masters this bug is unreachable because lazy expiry always deletes the expired key before the write creates a new value, generating two touchWatchedKey calls that correctly handle all watchers. The fix removes the break and lets the code fall through to the normal CLIENT_DIRTY_CAS path, correctly flagging all remaining watchers as dirty. Added two regression tests (TX2-TC1 and TX2-TC3) using a writable replica setup to reproduce and verify the fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Yangqing force-pushed the fix/watch-expired-key-break-bug branch from 577a29c to 0f01552 Compare January 29, 2026 09:44

sundb added this to Redis 8.8 Jan 30, 2026

github-project-automation bot moved this to Todo in Redis 8.8 Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix touchWatchedKey() break skipping remaining expired watchers#14752

Fix touchWatchedKey() break skipping remaining expired watchers#14752
Yangqing wants to merge 1 commit intoredis:unstablefrom
Yangqing:fix/watch-expired-key-break-bug

Yangqing commented Jan 29, 2026 •

edited by cursor bot

Loading

Uh oh!

CLAassistant commented Jan 29, 2026 •

edited

Loading

Uh oh!

jit-ci bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yangqing commented Jan 29, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause Analysis

How to Trigger

Changes

Uh oh!

CLAassistant commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci bot commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yangqing commented Jan 29, 2026 •

edited by cursor bot

Loading

CLAassistant commented Jan 29, 2026 •

edited

Loading