Status: Proposed – for team discussion
Date: 2025‑08‑05
We run several PHP‑FPM workloads in Kubernetes but lack visibility into worker utilisation, queue depth, and slow‑request counts.
Prometheus is our standard telemetry backend; Grafana dashboards and alert rules expect Prometheus‑formatted metrics.
Constraints:
No application‑level changes – we cannot add libraries or modify PHP code.
Limited control over existing Helm charts – major spec edits are difficult to upstream.
Operational overhead should stay low; team is wary of maintaining many small custom components.
Desired outcome: surface PHP‑FPM pool metrics with minimal extra footprint while preserving per‑pod isolation and future maintainability.
Ease of rollout (hours vs days).
Operational simplicity – upgrades, troubleshooting, security patches.
Performance overhead – CPU/RAM and label cardinality.
Observability quality – per‑pod granularity, clear ownership of metrics.
Security posture – avoid exposing raw status endpoints cluster‑wide.
Option | Summary | Score* |
A. Sidecar exporter (hipages/php‑fpm_exporter) | Add one lightweight Go binary beside PHP‑FPM container; scrape | 8 |
B. Node‑level DaemonSet exporter | Run one exporter per node; auto‑discover sockets or TCP ports | 5 |
C. Nginx / NJS inline transform | Use existing Nginx reverse‑proxy container to subrequest | 7 |
D. Direct PodMonitor scrape | Point Prometheus at | 3 |
E. Mutating‑webhook to inject exporter | Kyverno/OPA patch at admission time | 6 |
*Preliminary 1–10 ranking against decision drivers.
Adopt Option A – Sidecar exporter for new PHP‑FPM workloads.Rationale:
Keeps metrics and app lifecycle tightly coupled – scales with replicas.
Simple to reason about; widely adopted community image; negligible overhead (<10 MiB RAM, ~1 % CPU).
Avoids cross‑pod traffic and security exposure of raw
/status
pages.
Legacy Helm charts we can’t modify quickly may temporarily run Option C (Nginx transform) to avoid new containers.
Very high‑density clusters could pilot Option B if sidecar count becomes a resource concern.
Positive: Unified metrics pipeline; faster RCA; HPA tuning based on accurate concurrency data.
Negative: One extra image layer per pod; CI/CD pipeline must pin exporter tag and track CVEs.
Operational tasks:
Patch Helm values to add exporter container + Service port
metrics
.Create corresponding
ServiceMonitor
with 30 s scrape interval and 15 s timeout.Add runbook entry
runbook/php-fpm.md
describing key metrics and alert rules.
Can we standardise Helm chart overlays across teams to minimise drift?
Should we bundle exporter config into our base Helm library chart?
What alert thresholds (e.g., active/total > 0.8 for 5 min) make sense for our traffic patterns?
Prepared by: DevOps Guild – draft for review
Comments & revisions welcome before 2025‑08‑12 review meeting