fix: FeatureView serialization with cycle detection#5502
Conversation
Signed-off-by: HaoXuAI <sduxuhao@gmail.com>
There was a problem hiding this comment.
Pull Request Overview
Adds support for on-demand feature sources, enforces cycle detection in FeatureView (de)serialization, standardizes the topological sort API, and refreshes related documentation.
- Introduce
OnDemandSourceTypealias and refactorsourcessignature inOnDemandFeatureView. - Rename all
topo_sortfunctions/methods totopological_sortand update callers. - Enhance
FeatureView.to_proto/from_protowith cycle detection and update__copy__/__eq__. - Add DAG module README and update compute-engine reference docs.
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| sdk/python/feast/on_demand_feature_view.py | Add OnDemandSourceType alias and clean up sources hints |
| sdk/python/feast/infra/compute_engines/feature_resolver.py | Rename resolver method from topo_sort to topological_sort |
| sdk/python/feast/infra/compute_engines/feature_builder.py | Update builder calls to topological_sort |
| sdk/python/feast/infra/compute_engines/algorithms/topo.py | Rename functions topo_sort[_multiple] to topological_sort[_multiple] |
| sdk/python/feast/infra/compute_engines/dag/README.md | Add high-level DAG documentation |
| sdk/python/feast/feature_view.py | Implement cycle detection in (de)serialization, update copy/eq |
| docs/reference/compute-engine/README.md | Refresh compute-engine table and links |
| docs/getting-started/concepts/batch-feature-view.md | New guide for BatchFeatureView |
Comments suppressed due to low confidence (1)
docs/reference/compute-engine/README.md:22
- The markdown link for
ExecutionPlanis nested as[link]([link](...)). It should be formatted as a single link, e.g.[link](URL).
| `ExecutionPlan` | Executes nodes in dependency order and stores intermediate outputs | [link]([link](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/compute_engines/dag/README.md)) |
Signed-off-by: HaoXuAI <sduxuhao@gmail.com>
| for feature in on_demand_feature_view_proto.spec.features | ||
| ], | ||
| sources=sources, | ||
| sources=cast(List[OnDemandSourceType], sources), |
There was a problem hiding this comment.
Somehow this was breaking mypy, so put a fix on this.
| ## ✅ Key Capabilities | ||
|
|
||
| - **Composable DAG of FeatureViews**: Supports defining a `BatchFeatureView` on top of one or more other `FeatureView`s. | ||
| - **Transformations**: Apply PySpark-based transformation logic (`feature_transformation` or `udf`) to raw data source, can also be used to deal with multiple data sources. |
There was a problem hiding this comment.
is this exclusively limited to PySpark?
There was a problem hiding this comment.
not exclusively to Pyspark, will update it
| *, | ||
| name: str, | ||
| source: Union[DataSource, FeatureView, List[FeatureView]], | ||
| sink_source: Optional[DataSource] = None, |
There was a problem hiding this comment.
for some reason i thought we agreed on naming it sink.
There was a problem hiding this comment.
Yeah, on a second thought, I think sink_source is more explicit for the user to know it is passing a data source to this config.
| Field(name="conv_rate", dtype=Float32), | ||
| ], | ||
| aggregations=[ | ||
| Aggregation(column="conv_rate", function="sum", time_window=timedelta(days=1)), |
There was a problem hiding this comment.
nit: maybe we should make it avg since summing a conversion rate is weird
| ## Feature resolver and builder | ||
| The `FeatureBuilder` initialize a `FeatureResolver` that extracts a DAG from the `FeatureView` definitions, resolving dependencies and ensuring correct execution order. \ | ||
| The FeatureView represents a logical data source, while DataSource represents the physical data source (e.g., BigQuery, Spark, etc.). \ | ||
| When defines the FeatureView, the source can be a physical DataSource, a derived FeatureView, or a list of FeatureViews. |
There was a problem hiding this comment.
missing a word here. ?
franciscojavierarceo
left a comment
There was a problem hiding this comment.
lgtm, had some small suggestions that would be great to incorporate though. thanks for the docs!!!
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Signed-off-by: HaoXuAI <sduxuhao@gmail.com>
Signed-off-by: HaoXuAI <sduxuhao@gmail.com>
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
# [0.51.0](v0.50.0...v0.51.0) (2025-07-21) ### Bug Fixes * FeatureView serialization with cycle detection ([#5502](#5502)) ([f287ca5](f287ca5)) * Fix current version in publish workflow ([#5499](#5499)) ([0af6e94](0af6e94)) * Fix NPM authentication ([#5506](#5506)) ([9f85892](9f85892)) * Fix verify wheels workflow for macos14 ([#5486](#5486)) ([07174cc](07174cc)) * Fixed error thrown for invalid project name on features api ([#5525](#5525)) ([4a9a5d0](4a9a5d0)) * Fixed ODFV on-write transformations ([271ef74](271ef74)) * Move Install OS X dependencies before python setup ([#5488](#5488)) ([35f211c](35f211c)) * Normalize current version by removing 'v' prefix if present ([#5500](#5500)) ([43f3d52](43f3d52)) * Skip macOS 14 with Python 3.10 due to gettext library ([#5490](#5490)) ([41d4977](41d4977)) * Standalone Web UI Publish Workflow ([#5498](#5498)) ([c47b134](c47b134)) ### Features * Added endpoints to allow user to get data for all projects ([4e06965](4e06965)) * Added grpc and rest endpoint for features ([#5519](#5519)) ([0a75696](0a75696)) * Added relationship support to all API endpoints ([#5496](#5496)) ([bea83e7](bea83e7)) * Continue updating doc ([#5523](#5523)) ([ea53b2b](ea53b2b)) * Hybrid offline store ([#5510](#5510)) ([8f1af55](8f1af55)) * Populate created and updated timestamp on data sources ([af3056b](af3056b)) * Provide ready-to-use Python definitions in api ([37628d9](37628d9)) * Snowflake source. fetch MAX in a single query ([#5387](#5387)) ([b49cea1](b49cea1)) * Support compute engine to use multi feature views as source ([#5482](#5482)) ([b9ac90b](b9ac90b)) * Support pagination and sorting on registry apis ([#5495](#5495)) ([c4b6fbe](c4b6fbe)) * Update doc ([#5521](#5521)) ([2808ce1](2808ce1))
What this PR does / why we need it:
Follow up on PR: #5482.
Which issue(s) this PR fixes:
Misc