Skip to content

Transactions#5562

Merged
blp merged 2 commits intomainfrom
transactions
Feb 12, 2026
Merged

Transactions#5562
blp merged 2 commits intomainfrom
transactions

Conversation

@blp
Copy link
Member

@blp blp commented Feb 4, 2026

Fix the transactions feature on multihost.

The individual commits have good commit messages.

@blp blp requested a review from swanandx February 4, 2026 23:07
@blp blp self-assigned this Feb 4, 2026
@blp blp added bug Something isn't working rust Pull requests that update Rust code labels Feb 4, 2026
Copilot AI review requested due to automatic review settings February 4, 2026 23:07
@blp blp added enterprise Issue related to Feldera Enterprise features. multihost Related to multihost or distributed pipelines labels Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements support for multi-host transactions by modifying the transaction coordination logic to handle coordinator-based decisions and per-connector transaction tracking.

Changes:

  • Updated transaction initiator logic to conditionally clear connector-initiated transactions based on multi-host mode
  • Modified committed connectors selection to defer to coordinator when present
  • Enhanced connector transaction request handling with multi-host specific logic

if self.transaction_state.is_committing() {
return Err(ControllerError::TransactionInProgress);
}

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed assertion assert!(!self.initiators.is_ongoing(self.is_multihost)) without verification. In multi-host mode, this could allow transaction_id assignment even when there are ongoing connector-initiated transactions, potentially causing state inconsistency.

Suggested change
if self.initiators.is_ongoing(self.is_multihost) {
error!(
"Attempted to start a new transaction while another transaction is in progress"
);
return Err(ControllerError::TransactionInProgress);
}

Copilot uses AI. Check for mistakes.
}
Entry::Occupied(mut entry) => {
assert!(self.is_multihost || self.initiators.transaction_id.is_some());
assert!(self.initiators.transaction_id.is_some());
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion will fail in multi-host mode when a connector commits a transaction. In multi-host mode, the early return at line 4739 prevents this assertion from being checked, but the logic change at line 4688 means transaction_id may not be set for connector-only transactions. If the multi-host early return is removed or bypassed, this assertion would fail.

Copilot uses AI. Check for mistakes.
@blp blp requested a review from ryzhyk February 4, 2026 23:29
Copy link
Contributor

@ryzhyk ryzhyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blp , does the coordinator have the logic to prevent the same connector from rejoining a transaction after it has committed it? This is important for liveness (i.e., to guarantee the transaction eventually commits)

@blp
Copy link
Member Author

blp commented Feb 9, 2026

@blp , does the coordinator have the logic to prevent the same connector from rejoining a transaction after it has committed it?

Yes.

blp added 2 commits February 11, 2026 15:34
…ses.

When a coordinator is active, it specifies which input connectors should
accept input in a particular step, but the pipeline was not fully honoring
that, which caused input to stall during connector-initiated transactions.

Issue: feldera/cloud#1436

Signed-off-by: Ben Pfaff <blp@feldera.com>
In multihost, controller requests for transactions are decoupled from
the coordinator starting and committing transactions as a whole.  As a
result, it's common for a connector to request to start a transaction
while the coordinator is still committing one, which caused an error to
be logged.  This also caused later error logging when the connectors tried
to commit the transaction that it wasn't able to start.  The result was
that connector-initiated transactions were mostly ineffective.

This commit changes the model for connector-initiated transactions in
multihost.  Instead of tying them to the overall transaction status, with
this commit, they are simply added and removed when the connector requests.
There is no longer any need for the coordinator to advance them from
requesting a commit to finally committed.  This was previously included
to match the single-host model, but it should not be necessary because
even if a connector quickly commits and starts a transaction, the
coordinator will still be notified of the toggle and can (and does)
treat the connector as requesting a commit on its side.

This worked fine for me over a multiple-minute multihost run with
connector-initiated transactions.

Issue: feldera/cloud#1436

Signed-off-by: Ben Pfaff <blp@feldera.com>
@swanandx swanandx enabled auto-merge February 11, 2026 10:04
@swanandx swanandx added this pull request to the merge queue Feb 11, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 11, 2026
@blp blp added this pull request to the merge queue Feb 11, 2026
Merged via the queue into main with commit 2420056 Feb 12, 2026
1 check passed
@blp blp deleted the transactions branch February 12, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enterprise Issue related to Feldera Enterprise features. multihost Related to multihost or distributed pipelines rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants