feat: Add possibility to materialize only latest values, to increase performance#5713
Merged
franciscojavierarceo merged 9 commits intofeast-dev:masterfrom Nov 11, 2025
Conversation
…terialization logic (calling it) Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
Contributor
Author
Contributor
Author
Signed-off-by: lukas.valatka <lukas.valatka@cast.ai>
…e' of github.com:astronautas/feast into feat/add-selective-deduplicate-pushdown-to-offline-store
Contributor
Author
|
Let's re-run tests? Random issue, but no changes to dependency management :/ |
Contributor
Author
Contributor
Author
|
seems like aws creds have expired @franciscojavierarceo |
Member
|
@ntkathole @jeremyary can you investigate? |
HaoXuAI
reviewed
Nov 5, 2025
Collaborator
HaoXuAI
left a comment
There was a problem hiding this comment.
I think it might be better to add the config to the fs.materialize API? So that you can customize the materialize process that materialize the FeatureView if you need pushdown filter, and some other process you don't need.
Contributor
Author
Why not indeed. I'll check it out and tag you back. |
8d77b72
into
feast-dev:master
17 of 18 checks passed
HaoXuAI
pushed a commit
that referenced
this pull request
Nov 12, 2025
…performance (#5713) * add pull_all_from_table_or_query for clickhouse, to align with new materialization logic (calling it) Signed-off-by: lukas.valatka <lukas.valatka@cast.ai> * add option to select to materialize only latest values, for performance Signed-off-by: lukas.valatka <lukas.valatka@cast.ai> * enforce non optional params Signed-off-by: lukas.valatka <lukas.valatka@cast.ai> --------- Signed-off-by: lukas.valatka <lukas.valatka@cast.ai> Co-authored-by: Lukas Valatka <lukas@valatka.net>
franciscojavierarceo
pushed a commit
that referenced
this pull request
Nov 13, 2025
# [0.57.0](v0.56.0...v0.57.0) (2025-11-13) ### Bug Fixes * Improve trino to feast type mapping with (real,varchar,timestamp,decimal) ([#5691](#5691)) ([f855ad2](f855ad2)) * Materialize API - ODFV views not looked-up (thinks views non existant) - crashes materialize ([#5716](#5716)) ([1b050b3](1b050b3)) * Support historical feature retrieval with start_date/end_date in RemoteOfflineStore ([#5703](#5703)) ([ad32756](ad32756)) * Thread safe Clickhouse offline store ([#5710](#5710)) ([5f446ed](5f446ed)) ### Features * Add annotations to cronjob CRDs ([#5701](#5701)) ([be6e6c2](be6e6c2)) * Add batch commit mode for MySQL OnlineStore ([#5699](#5699)) ([3cfe4eb](3cfe4eb)) * Add possibility to materialize only latest values, to increase performance ([#5713](#5713)) ([8d77b72](8d77b72)) * Support table format: Iceberg, Delta, and Hudi ([#5650](#5650)) ([2915ad1](2915ad1))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What this PR does / why we need it:
Adds an option to materialize only the latest values (essentially pushes down deduplication to offline store), to reduce client memory consumption and reduce e2e duration. Especially noticeable for large-scale materialization - think hundreds of thousands of rows with ~150 feature views, with latency-critical materializations - as we observed in our ML project at cast.ai.
Which issue(s) this PR fixes:
#5707 (comment)
Misc
This will be configured via feature store (repo) config file: