feat(go): Implement metrics and tracing for http and grpc servers#5925
feat(go): Implement metrics and tracing for http and grpc servers#5925luisazofracabify wants to merge 5 commits intofeast-dev:masterfrom
Conversation
82aca5b to
b876dba
Compare
b876dba to
8b5cd4b
Compare
8b5cd4b to
847393e
Compare
go/feast-server
Outdated
There was a problem hiding this comment.
@luisazofracabify Please remove this binary file from the commit or add it to the .gitignore file if you find it valuable.
go/internal/feast/metrics/metrics.go
Outdated
| ) | ||
|
|
||
| var Metrics struct { | ||
| HttpDuration func(HttpLabels) TimeHistogram `name:"feast_http_request_duration_seconds" help:"Time taken to serve HTTP requests" buckets:".005,.01,.025,.05,.1,.25,.5,1,2.5,5,10"` |
There was a problem hiding this comment.
Please remove the feast_ preffix from each metric. Take into account it's already in the initialization with gotoprom.MustInit(&Metrics, "feast"). If you keep it you will end up with metrics with names like:
feast_feast_http_request_duration_seconds.
Anyway what i would do is to remove the feast_ at all. You could split http and grpc metrics in 2 separated Metrics structs. This way you woulkd have just:
http_request_duration_seconds and grpc_request_duration_seconds
There was a problem hiding this comment.
I would remove this file. Such a test does not actually test that metrics actually record values.
There was a problem hiding this comment.
Is this change in import libraries order needed? If not, please try to modify only files which are strictly necessary for your contribution.
655cd02 to
34de413
Compare
go/main.go
Outdated
| flag.IntVar(&port, "port", port, "Specify a port for the server") | ||
| flag.Parse() | ||
|
|
||
| metrics.InitMetrics() |
There was a problem hiding this comment.
do you need this? AFAIK the init() function in Go is automatically called by the Go runtime when the package is imported. It doesn't need to be explicitly called anywhere.
There was a problem hiding this comment.
Thanks for the suggestion, I applied the fix as metrics.InitMetrics() was not really needed
e7a22fc to
2e4c1b3
Compare
a5345c2 to
f5449af
Compare
go/main.go
Outdated
| if loggingService != nil { | ||
| loggingService.Stop() | ||
| } | ||
|
|
There was a problem hiding this comment.
Beware of trailing spaces in this line and 208.
db00427 to
b9f99a6
Compare
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
b9f99a6 to
7b552a9
Compare
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
7b552a9 to
1bf7277
Compare
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
5d6cc58 to
816423b
Compare
Signed-off-by: Luis Azofra Begara <luis.azofra@cabify.com>
Description
This PR significantly improves the observability and reliability of the Feast Go Feature Server by implementing comprehensive Prometheus metrics, and robust configuration options for both HTTP and gRPC modes. It addresses issues with hardcoded ports, inconsistent metric exposure, and potential race conditions during shutdown.
Changes
1. Prometheus Metrics Instrumentation
:9090) for serving metrics in both HTTP and gRPC modes. This unifies the observability strategy.metricsMiddlewareininternal/feast/server/http_server.goto track:http_request_duration_seconds)http_requests_total)/healthendpoint for readiness probe visibility.go-grpc-prometheusto fully expose standard gRPC server metrics.2. Tracing Enhancements
OTEL_SERVICE_NAMEenvironment variable (defaults toFeastGoFeatureServer). This allows proper service identification in distributed tracing systems without code changes.3. Server Configuration & Reliability
-metrics-portflag tomain.go.sync.WaitGroupinStartHttpServerto ensure clean shutdown of the metrics server and logging services, matching the robust behavior ofStartGrpcServer.ServerStarterinterfaces and removed unused legacy code.How Has This Been Tested?
Verification
/metricsendpoint accessibility on custom ports./healthand request metrics counters.200recording fix.OTEL_SERVICE_NAMEenv var reflects in traces.Checklist
go.mod/go.sum).Why gotoprom
It reduces boilerplate code for histograms and ensures type-safety for labels, preventing runtime panics due to mismatched label cardinality. It wraps the official prometheus client, so it's fully compatible