Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements support for multi-host transactions by modifying the transaction coordination logic to handle coordinator-based decisions and per-connector transaction tracking.
Changes:
- Updated transaction initiator logic to conditionally clear connector-initiated transactions based on multi-host mode
- Modified committed connectors selection to defer to coordinator when present
- Enhanced connector transaction request handling with multi-host specific logic
| if self.transaction_state.is_committing() { | ||
| return Err(ControllerError::TransactionInProgress); | ||
| } | ||
|
|
There was a problem hiding this comment.
Removed assertion assert!(!self.initiators.is_ongoing(self.is_multihost)) without verification. In multi-host mode, this could allow transaction_id assignment even when there are ongoing connector-initiated transactions, potentially causing state inconsistency.
| if self.initiators.is_ongoing(self.is_multihost) { | |
| error!( | |
| "Attempted to start a new transaction while another transaction is in progress" | |
| ); | |
| return Err(ControllerError::TransactionInProgress); | |
| } |
| } | ||
| Entry::Occupied(mut entry) => { | ||
| assert!(self.is_multihost || self.initiators.transaction_id.is_some()); | ||
| assert!(self.initiators.transaction_id.is_some()); |
There was a problem hiding this comment.
This assertion will fail in multi-host mode when a connector commits a transaction. In multi-host mode, the early return at line 4739 prevents this assertion from being checked, but the logic change at line 4688 means transaction_id may not be set for connector-only transactions. If the multi-host early return is removed or bypassed, this assertion would fail.
Yes. |
…ses. When a coordinator is active, it specifies which input connectors should accept input in a particular step, but the pipeline was not fully honoring that, which caused input to stall during connector-initiated transactions. Issue: feldera/cloud#1436 Signed-off-by: Ben Pfaff <blp@feldera.com>
In multihost, controller requests for transactions are decoupled from the coordinator starting and committing transactions as a whole. As a result, it's common for a connector to request to start a transaction while the coordinator is still committing one, which caused an error to be logged. This also caused later error logging when the connectors tried to commit the transaction that it wasn't able to start. The result was that connector-initiated transactions were mostly ineffective. This commit changes the model for connector-initiated transactions in multihost. Instead of tying them to the overall transaction status, with this commit, they are simply added and removed when the connector requests. There is no longer any need for the coordinator to advance them from requesting a commit to finally committed. This was previously included to match the single-host model, but it should not be necessary because even if a connector quickly commits and starts a transaction, the coordinator will still be notified of the toggle and can (and does) treat the connector as requesting a commit on its side. This worked fine for me over a multiple-minute multihost run with connector-initiated transactions. Issue: feldera/cloud#1436 Signed-off-by: Ben Pfaff <blp@feldera.com>
b17550a to
1b91c4b
Compare
Fix the transactions feature on multihost.
The individual commits have good commit messages.