Conversation
Signed-off-by: Ben Pfaff <blp@feldera.com>
Signed-off-by: Ben Pfaff <blp@feldera.com>
This prepares for adding another configuration setting. Signed-off-by: Ben Pfaff <blp@feldera.com>
The following commit will add another way to stage records. This commit makes that one easier to understand. This commit should not change any behavior. Signed-off-by: Ben Pfaff <blp@feldera.com>
Issue: #5607 Signed-off-by: Ben Pfaff <blp@feldera.com>
Signed-off-by: feldera-bot <feldera-bot@feldera.com>
ryzhyk
left a comment
There was a problem hiding this comment.
This looks clean. One thing I couldn't figure out is how we make sure that some of the partition receivers don't buffer unbounded messages while waiting for stragglers.
|
|
||
| - If one or a few partitions have timestamps far behind the others, only | ||
| those partitions will be processed until all the old events are | ||
| processed. (This is the flip side of the previous pitfall.) |
There was a problem hiding this comment.
Flink has a feature called source idleness detection to deal with this problem. We might need something similar.
Hmm, I thought I had that figured out, but now that I look again, I was wrong. I'll add a per-partition queuing limit. |
Can this affect the connector even without this new feature, in the sense that there's nothing preventing partition readers from accumulating unbounded data? |
I took another look. We account data as buffered as soon as it comes in on the partition reader thread, so the total amount buffered is limited by the buffer limit plus however long it takes the controller to tell us to pause. So this will limit the amount buffered in either case. However, that means there's another complication: we need to make sure that every partition can buffer at least one record, even if the overall buffers are filled, because otherwise we can be full of buffered records that can't be input to the circuit. Also, I think that the idea of records that are buffered but can't be input to the circuit will cause livelock in the controller, which will continuously try to start a step since we've told it that there is data buffered. So, there might need to be further distinction between the modes:
I'll continue to think about this today. Thanks for asking probing questions. |
Describe Manual Test Plan
I didn't test this manually but I did write and run a new unit test for the feature.
Checklist