-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Labels
bugSomething isn't workingSomething isn't workinghigh priorityTask should be tackled first, added in the current sprint if necessaryTask should be tackled first, added in the current sprint if necessary
Description
Today, when start_from_checkpoint is set to latest, the pipeline always pulls the latest checkpoint from S3, even if a newer local checkpoint exists. This was done to make the pipeline state deterministic but in the present customer use cases, this is clearly a bad design.
Potential solutions
-
Compare the progress made by the local checkpoint vs remote checkpoint and pick the one that has made more progress (we can do this by comparing the number of steps made, and the number of records ingested). (Preferred)
-
Allow the user to specify a preference:
- prefer_remote: always prefer the remote checkpoint. May be suitable for standby pipelines
- prefer_local: always prefer the local checkpoint, if one is available
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghigh priorityTask should be tackled first, added in the current sprint if necessaryTask should be tackled first, added in the current sprint if necessary