Skip to content

feat(go): Implement metrics and tracing for http and grpc servers#5925

Open
luisazofracabify wants to merge 5 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability
Open

feat(go): Implement metrics and tracing for http and grpc servers#5925
luisazofracabify wants to merge 5 commits intofeast-dev:masterfrom
luisazofracabify:feat/go-server-observability

Conversation

@luisazofracabify
Copy link

@luisazofracabify luisazofracabify commented Jan 30, 2026

Description

This PR significantly improves the observability and reliability of the Feast Go Feature Server by implementing comprehensive Prometheus metrics, and robust configuration options for both HTTP and gRPC modes. It addresses issues with hardcoded ports, inconsistent metric exposure, and potential race conditions during shutdown.

Changes

1. Prometheus Metrics Instrumentation

  • Unified Metrics Server: Implemented a dedicated, configurable HTTP server (default port :9090) for serving metrics in both HTTP and gRPC modes. This unifies the observability strategy.
  • HTTP Server Instrumentation: Added metricsMiddleware in internal/feast/server/http_server.go to track:
    • Request duration (http_request_duration_seconds)
    • Request counts by method, path, and status (http_requests_total)
    • Instrumented the /health endpoint for readiness probe visibility.
  • gRPC Metrics: Integrated go-grpc-prometheus to fully expose standard gRPC server metrics.

2. Tracing Enhancements

  • Dynamic Service Name: Improved the existing OpenTelemetry integration by adding support for the OTEL_SERVICE_NAME environment variable (defaults to FeastGoFeatureServer). This allows proper service identification in distributed tracing systems without code changes.

3. Server Configuration & Reliability

  • Configurable Metrics Port: Added -metrics-port flag to main.go.
  • Graceful Shutdown: Implemented sync.WaitGroup in StartHttpServer to ensure clean shutdown of the metrics server and logging services, matching the robust behavior of StartGrpcServer.
  • Code Cleanup: Refactored ServerStarter interfaces and removed unused legacy code.

How Has This Been Tested?

Verification

  • Confirmed /metrics endpoint accessibility on custom ports.
  • Verified /health and request metrics counters.
  • Validated status 200 recording fix.
  • Tested OTEL_SERVICE_NAME env var reflects in traces.
  • Verified clean shutdown logs (SIGINT/SIGTERM).

Checklist

  • My code follows the code style of this project.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have updated the dependencies (go.mod / go.sum).
  • All new and existing tests passed.

Why gotoprom

It reduces boilerplate code for histograms and ensures type-safety for labels, preventing runtime panics due to mismatched label cardinality. It wraps the official prometheus client, so it's fully compatible


Open with Devin

@luisazofracabify luisazofracabify requested a review from a team as a code owner January 30, 2026 11:53
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 6 additional flags.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 82aca5b to b876dba Compare January 30, 2026 11:58
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 9 additional flags in Devin Review.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b876dba to 8b5cd4b Compare January 30, 2026 12:30
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View issue and 12 additional flags in Devin Review.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 8b5cd4b to 847393e Compare January 30, 2026 12:39
@luisazofracabify luisazofracabify changed the title feat(go): implement metrics and tracing for http and grpc servers feat(go): Implement metrics and tracing for http and grpc servers Feb 2, 2026
go/feast-server Outdated
Copy link
Contributor

@PepeluDev PepeluDev Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luisazofracabify Please remove this binary file from the commit or add it to the .gitignore file if you find it valuable.

)

var Metrics struct {
HttpDuration func(HttpLabels) TimeHistogram `name:"feast_http_request_duration_seconds" help:"Time taken to serve HTTP requests" buckets:".005,.01,.025,.05,.1,.25,.5,1,2.5,5,10"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the feast_ preffix from each metric. Take into account it's already in the initialization with gotoprom.MustInit(&Metrics, "feast"). If you keep it you will end up with metrics with names like:

feast_feast_http_request_duration_seconds.

Anyway what i would do is to remove the feast_ at all. You could split http and grpc metrics in 2 separated Metrics structs. This way you woulkd have just:

http_request_duration_seconds and grpc_request_duration_seconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this file. Such a test does not actually test that metrics actually record values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change in import libraries order needed? If not, please try to modify only files which are strictly necessary for your contribution.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from 655cd02 to 34de413 Compare February 6, 2026 09:01
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 9 additional findings in Devin Review.

Open in Devin Review

go/main.go Outdated
flag.IntVar(&port, "port", port, "Specify a port for the server")
flag.Parse()

metrics.InitMetrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need this? AFAIK the init() function in Go is automatically called by the Go runtime when the package is imported. It doesn't need to be explicitly called anywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I applied the fix as metrics.InitMetrics() was not really needed

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from e7a22fc to 2e4c1b3 Compare February 9, 2026 08:21
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 10 additional findings in Devin Review.

Open in Devin Review

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from a5345c2 to f5449af Compare February 9, 2026 08:37
go/main.go Outdated
if loggingService != nil {
loggingService.Stop()
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware of trailing spaces in this line and 208.

@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch 2 times, most recently from db00427 to b9f99a6 Compare February 11, 2026 08:30
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from b9f99a6 to 7b552a9 Compare February 11, 2026 12:16
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 7b552a9 to 1bf7277 Compare February 11, 2026 12:21
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
@luisazofracabify luisazofracabify force-pushed the feat/go-server-observability branch from 5d6cc58 to 816423b Compare February 11, 2026 13:27
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 16 additional findings in Devin Review.

Open in Devin Review

Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants