feat: Unify Feature Transformations and Feature Views#5747
feat: Unify Feature Transformations and Feature Views#5747franciscojavierarceo wants to merge 34 commits intomasterfrom
Conversation
2899413 to
649c429
Compare
| from feast.feature_view import FeatureView as FV | ||
|
|
||
| for src in sources: | ||
| if not isinstance(src, (FV, type(src).__name__ == "FeatureView")): |
There was a problem hiding this comment.
| if not isinstance(src, (FV, type(src).__name__ == "FeatureView")): | |
| if not isinstance(src, FeatureView): |
sdk/python/feast/feature_view.py
Outdated
| owner: str = "", | ||
| mode: Optional[Union["TransformationMode", str]] = None, | ||
| feature_transformation: Optional[Transformation] = None, | ||
| when: Optional[Union[TransformationTiming, str]] = None, |
There was a problem hiding this comment.
Need updates to proto ? when and online_enabled not serialized in from_proto.
Also, modify __eq__() and __copy__() for consider these new attributes ?
8b578bf to
c77cb83
Compare
|
I think FeatureView as the actually data assets, I think makes more sense to having the API. |
Yeah, that's fair. @HaoXuAI I updated the PR, lmk what you think. |
sdk/python/feast/feature_view.py
Outdated
| self.transform_when = transform_when | ||
|
|
||
| # Auto-infer online setting based on transform_when pattern | ||
| if transform_when in [ |
There was a problem hiding this comment.
So why batch on read/write forces online to be true?
I think we need to think about if we really need a new tranform_when config. And if we need it, what is its relation with online/offline.
I think it might be counterintuitive that if this featureview is only used by batch or stream or on demand. what if the user defines it today as one mode and wants to change it to another one or allow both mode in the future?
My personal pint of view is removing any config for this. As long as you define the transformation, it will be applied to run before writing to destination. Use online/offline to control the destination only, if both are off, then no write. For any transformation needed on read, this should be configured at API level. Fo example, you can put get_online_feature(transform=true) to force transformation.
There was a problem hiding this comment.
I think we need to think about if we really need a new tranform_when config. And if we need it, what is its relation with online/offline.
Here's how things actually work:
- Batch
- Sometimes data can be transformed online
- On Reads
- On Writes
- Sometimes not, and that's basically a batch transform with no feature server transformation and pure materialization
- Streaming
- Handles historical data the same way it handles online serving
that's really what the transform_when indicates.
We need something other than online because the rest will be executed by the server and you can't orchestrate the transformation on write during retrieval.
There was a problem hiding this comment.
I'm fine with renaming something but we need something equivalent to write_to_online_store that's more obvious.
There was a problem hiding this comment.
online = true means write to online store.
transform_on_write should be removed from the server. it's a hacky way. #5283. this PR doesn't get enough reviews I think.
#5300. I think this one is also not ideal implementation as well.
When you use get_online_feature, you should assign it a transform=true to run transformation in "retrieval".
There was a problem hiding this comment.
I've read through the feature server code and on demand feature view implementation. I think a lot code needs to be refactored. And if you think there is some lines of code won't work you can tag me. I'll try to explain it
There was a problem hiding this comment.
Yes we agreed on this direction and I'm going to implement things that way. Just need to finish the work.
5abbec2 to
e5f074e
Compare
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
… auto-inference Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
e5f074e to
a87c4b4
Compare
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
… store (#5807) * Update redis.py Add millisecond-precision timestamp support to Redis online store Signed-off-by: Jatin Kumar <jatink.5251@gmail.com> * Update redis.py sub-second precision when returning timestamps to client Signed-off-by: Jatin Kumar <jatink.5251@gmail.com> * Update redis.py fix(redis): preserve millisecond timestamp precision Signed-off-by: Jatin Kumar <jatink.5251@gmail.com> * Update redis.py fix: Remove whitespace on blank lines (W293) Signed-off-by: Jatin Kumar <jatink.5251@gmail.com> --------- Signed-off-by: Jatin Kumar <jatink.5251@gmail.com>
Signed-off-by: samuelkim7 <samuel.kim@goflink.com>
* chore: Refactor some unit tests into integration tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * chore: Refactor some unit tests into integration tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * rename TestConfig Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * rename TestConfig Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * add integration flag Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * update paths Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> * update paths Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Srihari <svenkata@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
What this PR does / why we need it:
This PR refactors the transformation system with a cleaner architecture that separates transformation logic from execution. Transformations should focus on HOW to transform data, while FeatureViews handle WHEN and WHERE to execute.
Changes:
feature_transformationfield to FeatureView for transformation logictransform: bool = Trueparameter to API methods for per-call controltransform=False)Which issue(s) this PR fixes:
#4584, #5716, #5689
Misc