Skip to content

Fix touchWatchedKey() break skipping remaining expired watchers#14752

Open
Yangqing wants to merge 1 commit intoredis:unstablefrom
Yangqing:fix/watch-expired-key-break-bug
Open

Fix touchWatchedKey() break skipping remaining expired watchers#14752
Yangqing wants to merge 1 commit intoredis:unstablefrom
Yangqing:fix/watch-expired-key-break-bug

Conversation

@Yangqing
Copy link

@Yangqing Yangqing commented Jan 29, 2026

Summary

touchWatchedKey() in multi.c has a break statement that exits the watcher iteration loop early when it encounters a watcher with wk->expired == 1 whose key still exists in the DB (dbFind != NULL). This causes all subsequent watchers in the list to never be visited — they never receive CLIENT_DIRTY_CAS, and their MULTI/EXEC transactions incorrectly succeed instead of being aborted.

The fix: Remove the break and let the code fall through to the existing CLIENT_DIRTY_CAS / unwatchAllKeys() path, which correctly flags the client as dirty.

Root Cause Analysis

The relevant code in touchWatchedKey() handles watchers whose key was already expired at WATCH time (wk->expired == 1):

if (wk->expired) {
    if (db == wk->db &&
        equalStringObjects(key, wk->key) &&
        dbFind(db, key->ptr) == NULL)
    {
        /* Expired key is deleted — no logical change. Clear flag. */
        wk->expired = 0;
        goto skip_client;
    }
    break;  // <-- BUG: exits entire loop, skipping remaining watchers
}

c->flags |= CLIENT_DIRTY_CAS;
unwatchAllKeys(c);

When dbFind != NULL (the expired key has been re-created), the break exits the while loop entirely. Every watcher after the current one in the list is silently skipped.

How to Trigger

On a standalone master, this bug is unreachable. Lazy expiry (expireIfNeeded with EXPIRE_FORCE_DELETE_EXPIRED) always deletes the expired key before the write creates a new value. This produces two touchWatchedKey calls: first with dbFind == NULL (which clears wk->expired flags via the goto skip_client path), then with dbFind != NULL (which notifies watchers normally since expired flags are already cleared). The break is never reached.

The bug is triggered on writable replicas (replica-read-only no) when a key is locally expired on the replica but remains alive on the master:

  1. Master creates a key with no TTL (key never expires on master)
  2. The key is replicated to the writable replica
  3. On the replica, a local PEXPIREAT sets a short expiry (this local change is not replicated back)
  4. The key expires on the replica; multiple clients WATCH it (wk->expired = 1)
  5. Master does SET on the same key — since the key is not expired on the master, no DEL is propagated
  6. The replica receives the replicated SET with CLIENT_MASTER flag
  7. expireIfNeeded() returns KEY_VALID without deleting the expired key (due to CLIENT_MASTER check at db.c:2888)
  8. A single touchWatchedKey() call occurs with dbFind != NULL
  9. The break fires at the first expired watcher, skipping all remaining watchers
  10. Skipped watchers' transactions incorrectly succeed on EXEC

This can also occur through clock skew between master and replica, where a key naturally expires on the replica before the master.

Changes

  • src/multi.c: Replace break with a comment explaining the fall-through to the CLIENT_DIRTY_CAS path. When a key was expired at WATCH time but has since been re-created, this is a logical state change that should invalidate the transaction.

  • tests/unit/multi.tcl: Two regression tests using a writable replica setup:

    • 3 expired watchers: verifies all three transactions are aborted when the watched key is re-created via replication
    • 2 expired watchers: minimal reproduction case

Note

Medium Risk
Changes WATCH/MULTI/EXEC invalidation logic for expired watched keys, which can affect transaction outcomes in replication edge cases. Scope is small but touches core concurrency/replication correctness paths.

Overview
Fixes touchWatchedKey() so encountering an already-expired watcher no longer exits the watcher-iteration early when the key has been re-created; instead it falls through to mark the client CLIENT_DIRTY_CAS and UNWATCH, ensuring all remaining watchers are invalidated.

Adds regression coverage in tests/unit/multi.tcl using a writable replica scenario where a key expires locally but is later re-written via replication, asserting that multiple clients WATCHing the expired key all have their EXEC aborted (no side effects).

Written by Cursor Bugbot for commit 0f01552. This will update automatically on new commits. Configure here.

@CLAassistant
Copy link

CLAassistant commented Jan 29, 2026

CLA assistant check
All committers have signed the CLA.

@jit-ci
Copy link

jit-ci bot commented Jan 29, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

In touchWatchedKey(), when iterating the per-key watcher list and
encountering a watcher with wk->expired==1 whose key still exists
in the DB (dbFind != NULL), the code executed "break" which exited
the entire iteration loop. All subsequent watchers were never visited
and never received CLIENT_DIRTY_CAS, causing their MULTI/EXEC
transactions to incorrectly succeed.

The bug is triggered on writable replicas where a key can be locally
expired (via PEXPIREAT) while remaining alive on the master. When
the master later writes to the same key, the replicated command
arrives without a preceding DEL (since the key was not expired on
the master). The replica processes it with CLIENT_MASTER flag, which
causes expireIfNeeded() to return KEY_VALID without deleting the
expired key. This results in a single touchWatchedKey() call with
dbFind != NULL, triggering the break.

On standalone masters this bug is unreachable because lazy expiry
always deletes the expired key before the write creates a new value,
generating two touchWatchedKey calls that correctly handle all
watchers.

The fix removes the break and lets the code fall through to the
normal CLIENT_DIRTY_CAS path, correctly flagging all remaining
watchers as dirty.

Added two regression tests (TX2-TC1 and TX2-TC3) using a writable
replica setup to reproduce and verify the fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Yangqing Yangqing force-pushed the fix/watch-expired-key-break-bug branch from 577a29c to 0f01552 Compare January 29, 2026 09:44
@sundb sundb added this to Redis 8.8 Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants