Cloud Architecture

Architecture Decision Record: Exposing PHP‑FPM Metrics to Prometheus in Kubernetes

Status: Proposed – for team discussion

Date: 2025‑08‑05

1. Context

We run several PHP‑FPM workloads in Kubernetes but lack visibility into worker utilisation, queue depth, and slow‑request counts.
Prometheus is our standard telemetry backend; Grafana dashboards and alert rules expect Prometheus‑formatted metrics.
Constraints:
- No application‑level changes – we cannot add libraries or modify PHP code.
- Limited control over existing Helm charts – major spec edits are difficult to upstream.
- Operational overhead should stay low; team is wary of maintaining many small custom components.
Desired outcome: surface PHP‑FPM pool metrics with minimal extra footprint while preserving per‑pod isolation and future maintainability.

2. Decision Drivers

3. Considered Options

Option	Summary	Score*
A. Sidecar exporter (hipages/php‑fpm_exporter)	Add one lightweight Go binary beside PHP‑FPM container; scrape `/status` locally and expose `/metrics` on :9253	8
B. Node‑level DaemonSet exporter	Run one exporter per node; auto‑discover sockets or TCP ports	5
C. Nginx / NJS inline transform	Use existing Nginx reverse‑proxy container to subrequest `/status` and re‑emit Prometheus text	7
D. Direct PodMonitor scrape	Point Prometheus at `/status` directly (requires FPM to emit Prom format)	3
E. Mutating‑webhook to inject exporter	Kyverno/OPA patch at admission time	6

*Preliminary 1–10 ranking against decision drivers.

4. Decision (Draft)

Adopt Option A – Sidecar exporter for new PHP‑FPM workloads.Rationale:

Keeps metrics and app lifecycle tightly coupled – scales with replicas.
Simple to reason about; widely adopted community image; negligible overhead (<10 MiB RAM, ~1 % CPU).
Avoids cross‑pod traffic and security exposure of raw /status pages.

Exceptions / Edge Cases

Legacy Helm charts we can’t modify quickly may temporarily run Option C (Nginx transform) to avoid new containers.
Very high‑density clusters could pilot Option B if sidecar count becomes a resource concern.

5. Consequences

Positive: Unified metrics pipeline; faster RCA; HPA tuning based on accurate concurrency data.
Negative: One extra image layer per pod; CI/CD pipeline must pin exporter tag and track CVEs.
Operational tasks:
- Patch Helm values to add exporter container + Service port metrics.
- Create corresponding ServiceMonitor with 30 s scrape interval and 15 s timeout.
- Add runbook entry runbook/php-fpm.md describing key metrics and alert rules.

6. Open Questions

Can we standardise Helm chart overlays across teams to minimise drift?
Should we bundle exporter config into our base Helm library chart?
What alert thresholds (e.g., active/total > 0.8 for 5 min) make sense for our traffic patterns?

Prepared by: DevOps Guild – draft for review

Comments & revisions welcome before 2025‑08‑12 review meeting