Skip to content

Metrics

The apps/metrics process is a small Node.js sampler. Each instance is responsible for exactly one Valkey node (or one cluster, when run standalone): it opens a connection, runs the collectors defined in its config, writes NDJSON output to disk, exposes an HTTP API the server consumes, and registers itself with the Valkey Admin server’s /orchestrator/register endpoint at startup.

The metrics process is unusual in that it has two sources of configuration that layer on top of each other:

  1. config.yml — the canonical source for collector definitions, retention rules, server defaults, and logging defaults.
  2. Environment variables — used to inject per-instance details (which Valkey to talk to, where to register) and to override a handful of YAML fields at load time.

The sections below walk through both sources in the order they get applied.

When the metrics process starts it loads apps/metrics/config.yml (or the file at CONFIG_PATH if set), parses it with YAML, and merges it on top of these defaults:

backend:
ping_interval: 10000
server:
port: 3000
data_dir: /app/data
collector:
batch_ms: 60000
batch_max: 500
epics: []

Each entry in epics is also merged with per-epic defaults of data_retention_mb: 10 and data_retention_days: 30.

After the YAML is parsed, a small set of environment variables is allowed to override specific fields — see “YAML Overrides” below. Everything else in the YAML stays as written.

Each entry in the epics array describes one collector. The fields below apply to every epic; the monitor epic adds a few extras for sampling behavior.

FieldDescriptionDefault
nameIdentifier used in logs and the orchestrator registry.required
typeCollector implementation. One of memory_stats, info_cpu, commandlog_slow, commandlog_large_request, commandlog_large_reply, slowlog_len, monitor.required
poll_msHow often (ms) the collector runs.varies per epic
file_prefixFilename prefix for the NDJSON output written under DATA_DIR.required
data_retention_mbMax disk space (MB) this epic’s NDJSON files may use. Oldest files are evicted when the budget is exceeded.10
data_retention_daysFiles older than this (by birthtime) are deleted during the daily cleanup sweep.30

The Default column shows the fallback defaults applied by the YAML merge. apps/metrics/config.yml overrides these with lower values per epic (3–15 MB, 5 days). See the file itself for the full per-epic configuration.

These apply only when type: monitor and control the MONITOR-based hot keys sampling cycle:

FieldDescriptionDefault
monitoringDurationHow long each sampling run captures commands (ms).10000
monitoringIntervalPause between sampling runs (ms).10000
maxCommandsPerRunHard cap on commands captured per cycle. Cycle ends early when reached.1000000
cutoffFrequencyMinimum access count for a key to appear in results.100

Connection details come only from environment variables. The YAML never carries them, because in the default deployment the server injects them when it spawns a metrics child.

Host of the Valkey node this metrics process will sample. Required.

Port of the Valkey node. Required.

Connection topology used by the Valkey client.

  • "standalone" — single-node client (default)
  • "cluster" — cluster client
  • "sentinel" — sentinel client

If unset, falls back to valkey.mode from config.yml, then to "standalone".

Username for password or IAM authentication.

Password for password authentication. Ignored when VALKEY_AUTH_TYPE=iam.

Enable TLS. Compared as the literal string "true".

  • Default: false

Verify the TLS server certificate. When TLS is enabled and this is "false", certificate verification is skipped — useful for development against self-signed certs, but not for production.

Selects the credentials provider.

  • "iam" — use AWS ElastiCache IAM authentication via ElastiCacheIAMProvider. Requires VALKEY_USERNAME, VALKEY_AWS_REGION, and VALKEY_REPLICATION_GROUP_ID.
  • anything else — password authentication using VALKEY_USERNAME / VALKEY_PASSWORD.

AWS region used by the IAM credentials provider.

ElastiCache replication group / cluster name used as the IAM clusterName.

Each metrics process needs to identify itself and tell the Valkey Admin server where to reach it. Two pairs of variables handle this: one pair for the callback target (where the server lives), and one pair for the advertised address (where the metrics HTTP server can be reached from the server’s perspective).

The split matters in container deployments. The metrics process might bind on 0.0.0.0 inside a pod, but the address it should advertise to the orchestrator is the pod IP or service name — not the bind address.

Host of the Valkey Admin server this process should call to register.

  • Default: localhost

Port of the Valkey Admin server.

  • Default: 8080

Network interface the metrics HTTP server binds to. In a container you almost always want 0.0.0.0; on a developer machine you might prefer 127.0.0.1.

  • Default: 0.0.0.0

Host the metrics process advertises to the server in its registration payload — this is the host the orchestrator will actually dial back. Use it to bridge bind-vs-advertise differences in containers.

  • Default: falls back to METRICS_HOST, then 127.0.0.1

Legacy alias for METRICS_ADVERTISE_HOST. Kept for backward compatibility; new deployments should prefer METRICS_ADVERTISE_HOST.

Port advertised to the server. If unset, the process advertises the actual port assigned by app.listen(). This is what makes PORT=0 work — the OS picks a free port and the process tells the server which one.

These three variables override the matching fields in config.yml. Setting any of them on the environment wins over whatever is in the YAML.

TCP port the metrics HTTP server listens on. Setting PORT=0 lets the OS assign an ephemeral port — the server uses this when spawning many metrics children, so they don’t fight over fixed ports.

  • Default: cfg.server.port from config.yml (3000)
  • Overrides: cfg.server.port

Directory where NDJSON metric files are written and rotated. The server passes a per-node subdirectory here when spawning children, so each child gets its own slice of disk.

  • Default: cfg.server.data_dir from config.yml (/app/data); the cleaner module falls back to ./data if no value is available
  • Overrides: cfg.server.data_dir

Absolute path to the config.yml file. When set, the metrics process loads its config from this location instead of the bundled apps/metrics/config.yml. The Electron build uses this to point at a config file packaged inside the app bundle.

These also override fields under collector in config.yml. Use them to change batch behavior without editing the YAML.

How often (in milliseconds) the collector flushes a batch of samples.

  • Overrides: collector.batch_ms

Maximum number of samples in a single batch. The collector flushes whichever comes first — BATCH_MS or BATCH_MAX.

  • Overrides: collector.batch_max

Logger verbosity. Accepted values are the standard debug / info / warn / error set; the default is info.

If unset, the metrics process inherits logging.level from config.yml. The environment variable wins when both are set.

Logger output format.

  • "pretty" — human-readable output (default)
  • "json" — structured JSON lines, intended for log aggregators

If unset, inherits logging.format from config.yml.

When "1", enables verbose metric debug logging in fetchers.js and prints the loaded config at startup. Set to "0" to disable.

If unset, inherits the boolean debug_metrics from config.yml. This is the variable to flip when you need to see what the collector is actually doing.