Skip to content

[dbsp] Wait up to an hour for inter-host exchange to complete.#5622

Merged
blp merged 1 commit intomainfrom
deadline-exceeded
Feb 13, 2026
Merged

[dbsp] Wait up to an hour for inter-host exchange to complete.#5622
blp merged 1 commit intomainfrom
deadline-exceeded

Conversation

@blp
Copy link
Member

@blp blp commented Feb 13, 2026

It can take an arbitrary amount of time for exchange to complete, given that steps have arbitrary size and exchange doesn't necessarily run in the same order in every worker. This relaxes the deadline from 10 seconds to 1 hour.

Possibly this is a solution to the DeadlineExceeded errors that have been occasionally reported to me in multihost (it will at least eliminate a too-short deadline as the problem).

It can take an arbitrary amount of time for exchange to complete, given
that steps have arbitrary size and exchange doesn't necessarily run in the
same order in every worker.  This relaxes the deadline from 10 seconds to
1 hour.

Possibly this is a solution to the DeadlineExceeded errors that have
been occasionally reported to me in multihost (it will at least
eliminate a too-short deadline as the problem).

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp self-assigned this Feb 13, 2026
@blp blp added rust Pull requests that update Rust code multihost Related to multihost or distributed pipelines labels Feb 13, 2026
Copy link
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the 10 seconds in the previous code.
Is there a legit case where even 1 hour is not enough?

@blp
Copy link
Member Author

blp commented Feb 13, 2026

I don't see the 10 seconds in the previous code.

It's the default for tarpc: https://docs.rs/tarpc/latest/src/tarpc/context.rs.html#109-121

Is there a legit case where even 1 hour is not enough?

In theory, I suppose.

@blp blp enabled auto-merge February 13, 2026 01:05
@blp blp added this pull request to the merge queue Feb 13, 2026
@mihaibudiu
Copy link
Contributor

10 seconds sounds little even for intra-host exchange

@lalithsuresh
Copy link
Contributor

@blp if there is legitimately a network issue between the hosts, will it take an hour to detect the failure and restart the step?

Merged via the queue into main with commit 52687a9 Feb 13, 2026
2 checks passed
@blp blp deleted the deadline-exceeded branch February 13, 2026 02:59
@blp
Copy link
Member Author

blp commented Feb 13, 2026

@blp if there is legitimately a network issue between the hosts, will it take an hour to detect the failure and restart the step?

TCP should find the problem long before that.

We need a strategy for detecting network partitions but this probably isn't it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multihost Related to multihost or distributed pipelines rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants