Observability Tooling
In order to operate SpiceDB in a reliable and performant fashion, SpiceDB exposes various forms of observability (opens in a new tab) metadata.
Prometheus
Every SpiceDB command has a configurable HTTP server that serves observability data.
A Prometheus metrics endpoint (opens in a new tab) at can be found on this server at the path /metrics
.
Available metrics include operational information about the Go runtime and serving metrics for any servers that are enabled.
Profiling
Every SpiceDB command has a configurable HTTP server that serves observability data.
pprof endpoints (opens in a new tab) for various types of profiles can be found on this server at the path: /debug/pprof
.
The types of profiles available are:
- cpu: where a program spends its time while actively consuming CPU cycles
- heap: monitor current and historical memory usage, and to check for memory leaks
- threadcreate: sections of the program that lead the creation of new OS threads
- goroutine: stack traces of all current goroutines
- block: where goroutines block waiting on synchronization primitives (including timer channels)
- mutex: lock contention
For example, to download a CPU profile, you can run the following command:
go tool pprof 'http://spicedb.local:9090/debug/pprof/profile'
This will download the profile to $HOME/pprof
and drop you into a REPL for exploring the data.
Alternatively, you can upload profiles to pprof.me (opens in a new tab) to share with others.
OpenTelemetry Tracing
SpiceDB uses OpenTelemetry (opens in a new tab) for tracing (opens in a new tab) the lifetime of requests.
You can configure the tracing in SpiceDB via global flags, prefixed with otel
.
Here's a video walking through SpiceDB traces using Jaeger (opens in a new tab):
Structured Logging
SpiceDB emits logs to standard streams (opens in a new tab) using zerolog (opens in a new tab).
Logs come in two formats (console
, JSON
) and can be configured with the --log-format
global flag.
If a output device is non-interactive (i.e. not a terminal) these logs are emitted in NDJSON (opens in a new tab) by default.
Here's a comparison of the logs starting up a single SpiceDB v1.25 instance:
8:00PM INF configured logging async=false format=console log_level=info provider=zerolog
8:00PM INF configured opentelemetry tracing endpoint= insecure=false provider=none sampleRatio=0.01 service=spicedb v=0
8:00PM WRN this version of SpiceDB is out of date. See: https://github.com/authzed/spicedb/releases/tag/v1.26.0 latest-released-version=v1.26.0 this-version=v1.25.0
8:00PM INF configuration ClusterDispatchCacheConfig.Enabled=true ClusterDispatchCacheConfig.MaxCost=70% ClusterDispatchCacheConfig.Metrics=true ClusterDispatchCacheConfig.Name=cluster_dispatch ClusterDispatchCacheConfig.NumCounters=100000 Datastore=nil DatastoreConfig.BootstrapFileContents="(map of size 0)" DatastoreConfig.BootstrapFiles=[] DatastoreConfig.BootstrapOverwrite=false DatastoreConfig.BootstrapTimeout=10000 DatastoreConfig.ConnectRate=100 DatastoreConfig.DisableStats=false DatastoreConfig.EnableConnectionBalancing=true DatastoreConfig.EnableDatastoreMetrics=true DatastoreConfig.Engine=memory DatastoreConfig.FollowerReadDelay=4800 DatastoreConfig.GCInterval=180000 DatastoreConfig.GCMaxOperationTime=60000 DatastoreConfig.GCWindow=86400000 DatastoreConfig.LegacyFuzzing=-0.000001 DatastoreConfig.MaxRetries=10 DatastoreConfig.MaxRevisionStalenessPercent=0.1 DatastoreConfig.MigrationPhase=(empty) DatastoreConfig.OverlapKey=key DatastoreConfig.OverlapStrategy=static DatastoreConfig.ReadConnPool.HealthCheckInterval=30000 DatastoreConfig.ReadConnPool.MaxIdleTime=1800000 DatastoreConfig.ReadConnPool.MaxLifetime=1800000 DatastoreConfig.ReadConnPool.MaxLifetimeJitter=0 DatastoreConfig.ReadConnPool.MaxOpenConns=20 DatastoreConfig.ReadConnPool.MinOpenConns=20 DatastoreConfig.ReadOnly=false DatastoreConfig.RequestHedgingEnabled=false DatastoreConfig.RequestHedgingInitialSlowValue=10 DatastoreConfig.RequestHedgingMaxRequests=1000000 DatastoreConfig.RequestHedgingQuantile=0.95 DatastoreConfig.RevisionQuantization=5000 DatastoreConfig.SpannerCredentialsFile=(empty) DatastoreConfig.SpannerEmulatorHost=(empty) DatastoreConfig.TablePrefix=(empty) DatastoreConfig.URI=(empty) DatastoreConfig.WatchBufferLength=1024 DatastoreConfig.WriteConnPool.HealthCheckInterval=30000 DatastoreConfig.WriteConnPool.MaxIdleTime=1800000 DatastoreConfig.WriteConnPool.MaxLifetime=1800000 DatastoreConfig.WriteConnPool.MaxLifetimeJitter=0 DatastoreConfig.WriteConnPool.MaxOpenConns=10 DatastoreConfig.WriteConnPool.MinOpenConns=10 DisableV1SchemaAPI=false DisableVersionResponse=false DispatchCacheConfig.Enabled=true DispatchCacheConfig.MaxCost=30% DispatchCacheConfig.Metrics=true DispatchCacheConfig.Name=dispatch DispatchCacheConfig.NumCounters=10000 DispatchClientMetricsEnabled=true DispatchClientMetricsPrefix=(empty) DispatchClusterMetricsEnabled=true DispatchClusterMetricsPrefix=(empty) DispatchConcurrencyLimits.Check=0 DispatchConcurrencyLimits.LookupResources=0 DispatchConcurrencyLimits.LookupSubjects=0 DispatchConcurrencyLimits.ReachableResources=0 DispatchHashringReplicationFactor=100 DispatchHashringSpread=1 DispatchMaxDepth=50 DispatchServer.Address=:50053 DispatchServer.BufferSize=0 DispatchServer.ClientCAPath=(empty) DispatchServer.Enabled=false DispatchServer.MaxConnAge=30000 DispatchServer.MaxWorkers=0 DispatchServer.Network=tcp DispatchServer.TLSCertPath=(empty) DispatchServer.TLSKeyPath=(empty) DispatchUpstreamAddr=(empty) DispatchUpstreamCAPath=(empty) DispatchUpstreamTimeout=60000 Dispatcher=nil EnableExperimentalWatchableSchemaCache=false GRPCAuthFunc=(value) GRPCServer.Address=:50051 GRPCServer.BufferSize=0 GRPCServer.ClientCAPath=(empty) GRPCServer.Enabled=true GRPCServer.MaxConnAge=30000 GRPCServer.MaxWorkers=0 GRPCServer.Network=tcp GRPCServer.TLSCertPath=(empty) GRPCServer.TLSKeyPath=(empty) GlobalDispatchConcurrencyLimit=50 HTTPGateway.HTTPAddress=:8443 HTTPGateway.HTTPEnabled=false HTTPGateway.HTTPTLSCertPath=(empty) HTTPGateway.HTTPTLSKeyPath=(empty) HTTPGatewayCorsAllowedOrigins=[*] HTTPGatewayCorsEnabled=false HTTPGatewayUpstreamAddr=(empty) HTTPGatewayUpstreamTLSCertPath=(empty) MaxCaveatContextSize=4096 MaxDatastoreReadPageSize=1000 MaxRelationshipContextSize=25000 MaximumPreconditionCount=1000 MaximumUpdatesPerWrite=1000 MetricsAPI.HTTPAddress=:9090 MetricsAPI.HTTPEnabled=true MetricsAPI.HTTPTLSCertPath=(empty) MetricsAPI.HTTPTLSKeyPath=(empty) NamespaceCacheConfig.Enabled=true NamespaceCacheConfig.MaxCost=32MiB NamespaceCacheConfig.Metrics=true NamespaceCacheConfig.Name=namespace NamespaceCacheConfig.NumCounters=1000 PresharedSecureKey=(sensitive) SchemaPrefixesRequired=false ShutdownGracePeriod=0 SilentlyDisableTelemetry=false StreamingAPITimeout=30000 TelemetryCAOverridePath=(empty) TelemetryEndpoint=https://telemetry.authzed.com TelemetryInterval=3600000 V1SchemaAdditiveOnly=false
8:00PM INF using memory datastore engine
8:00PM WRN in-memory datastore is not persistent and not feasible to run in a high availability fashion
8:00PM INF configured namespace cache defaultTTL=0 maxCost="32 MiB" numCounters=1000
8:00PM INF datastore driver explicitly asked to skip schema watch datastore-type=*memdb.memdbDatastore
8:00PM INF configured dispatch cache defaultTTL=20600 maxCost="7.6 GiB" numCounters=10000
8:00PM INF configured dispatcher balancerconfig={"loadBalancingConfig":[{"consistent-hashring":{"replicationFactor":100,"spread":1}}]} concurrency-limit-check-permission=50 concurrency-limit-lookup-resources=50 concurrency-limit-lookup-subjects=50 concurrency-limit-reachable-resources=50
8:00PM INF grpc server started serving addr=:50051 insecure=true network=tcp service=grpc workers=0
8:00PM INF running server datastore=*proxy.observableProxy
8:00PM INF checking for startable datastore
8:00PM INF http server started serving addr=:9090 insecure=true service=metrics
8:00PM INF telemetry reporter scheduled endpoint=https://telemetry.authzed.com interval=1h0m0s next=1m35s
8:00PM INF received interrupt
8:00PM INF shutting down
8:00PM INF http server stopped serving addr=:9090 service=metrics
8:00PM INF grpc server stopped serving addr=:50051 network=tcp service=grpc
Audit Logs
Audit Logging is functionality exclusive to AuthZed products that publishes logs of SpiceDB API operations to a log sink.
You can read more about this functionality on the Audit Logging documentation.
Telemetry
SpiceDB reports metrics that are used to understand how clusters are being configured and the performance they are experiencing. The intent of collecting this information is to prioritize development that will have the most impact on the community.
Telemetry never shares data stored in SpiceDB that may contain anything sensitive.
Telemetry can always be disabled by providing the flag --telemetry-endpoint=""
.
TELEMETRY.md (opens in a new tab) documents the exact information being collected.
You can find all of the code in internal/telemetry (opens in a new tab).
Telemetry is reported via the Prometheus Remote Write protocol (opens in a new tab).
Any metrics prefixed with spicedb_telemetry
are reported hourly to telemetry.authzed.com
.