Fix touchWatchedKey() break skipping remaining expired watchers#14752
Open
Yangqing wants to merge 1 commit intoredis:unstablefrom
Open
Fix touchWatchedKey() break skipping remaining expired watchers#14752Yangqing wants to merge 1 commit intoredis:unstablefrom
Yangqing wants to merge 1 commit intoredis:unstablefrom
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
In touchWatchedKey(), when iterating the per-key watcher list and encountering a watcher with wk->expired==1 whose key still exists in the DB (dbFind != NULL), the code executed "break" which exited the entire iteration loop. All subsequent watchers were never visited and never received CLIENT_DIRTY_CAS, causing their MULTI/EXEC transactions to incorrectly succeed. The bug is triggered on writable replicas where a key can be locally expired (via PEXPIREAT) while remaining alive on the master. When the master later writes to the same key, the replicated command arrives without a preceding DEL (since the key was not expired on the master). The replica processes it with CLIENT_MASTER flag, which causes expireIfNeeded() to return KEY_VALID without deleting the expired key. This results in a single touchWatchedKey() call with dbFind != NULL, triggering the break. On standalone masters this bug is unreachable because lazy expiry always deletes the expired key before the write creates a new value, generating two touchWatchedKey calls that correctly handle all watchers. The fix removes the break and lets the code fall through to the normal CLIENT_DIRTY_CAS path, correctly flagging all remaining watchers as dirty. Added two regression tests (TX2-TC1 and TX2-TC3) using a writable replica setup to reproduce and verify the fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
577a29c to
0f01552
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
touchWatchedKey()inmulti.chas abreakstatement that exits the watcher iteration loop early when it encounters a watcher withwk->expired == 1whose key still exists in the DB (dbFind != NULL). This causes all subsequent watchers in the list to never be visited — they never receiveCLIENT_DIRTY_CAS, and theirMULTI/EXECtransactions incorrectly succeed instead of being aborted.The fix: Remove the
breakand let the code fall through to the existingCLIENT_DIRTY_CAS/unwatchAllKeys()path, which correctly flags the client as dirty.Root Cause Analysis
The relevant code in
touchWatchedKey()handles watchers whose key was already expired atWATCHtime (wk->expired == 1):When
dbFind != NULL(the expired key has been re-created), thebreakexits thewhileloop entirely. Every watcher after the current one in the list is silently skipped.How to Trigger
On a standalone master, this bug is unreachable. Lazy expiry (
expireIfNeededwithEXPIRE_FORCE_DELETE_EXPIRED) always deletes the expired key before the write creates a new value. This produces twotouchWatchedKeycalls: first withdbFind == NULL(which clearswk->expiredflags via thegoto skip_clientpath), then withdbFind != NULL(which notifies watchers normally since expired flags are already cleared). Thebreakis never reached.The bug is triggered on writable replicas (
replica-read-only no) when a key is locally expired on the replica but remains alive on the master:PEXPIREATsets a short expiry (this local change is not replicated back)WATCHit (wk->expired = 1)SETon the same key — since the key is not expired on the master, noDELis propagatedSETwithCLIENT_MASTERflagexpireIfNeeded()returnsKEY_VALIDwithout deleting the expired key (due toCLIENT_MASTERcheck atdb.c:2888)touchWatchedKey()call occurs withdbFind != NULLbreakfires at the first expired watcher, skipping all remaining watchersEXECThis can also occur through clock skew between master and replica, where a key naturally expires on the replica before the master.
Changes
src/multi.c: Replacebreakwith a comment explaining the fall-through to theCLIENT_DIRTY_CASpath. When a key was expired atWATCHtime but has since been re-created, this is a logical state change that should invalidate the transaction.tests/unit/multi.tcl: Two regression tests using a writable replica setup:Note
Medium Risk
Changes
WATCH/MULTI/EXECinvalidation logic for expired watched keys, which can affect transaction outcomes in replication edge cases. Scope is small but touches core concurrency/replication correctness paths.Overview
Fixes
touchWatchedKey()so encountering an already-expired watcher no longer exits the watcher-iteration early when the key has been re-created; instead it falls through to mark the clientCLIENT_DIRTY_CASandUNWATCH, ensuring all remaining watchers are invalidated.Adds regression coverage in
tests/unit/multi.tclusing a writable replica scenario where a key expires locally but is later re-written via replication, asserting that multiple clientsWATCHing the expired key all have theirEXECaborted (no side effects).Written by Cursor Bugbot for commit 0f01552. This will update automatically on new commits. Configure here.