
PHP-FPM Prometheus Monitoring on Kubernetes: A Tactical Guide for Scaling Teams
At 2 a.m. your autoscaler fires—but the Grafana panel that’s supposed to contextualise CPU spikes is empty. PHP-FPM is the beating heart of your commerce API, yet its worker queue, slow-request count, and process states remain invisible. Monitoring blind spots become fire-fighting marathons just when you need surgical precision.
“If you can’t measure PHP-FPM, you’re scaling in the dark.”
PHP-FPM Prometheus monitoring nvolves converting PHP-FPM’s FastCGI status page into Prometheus exposition format and scraping it. As a result, the pool’s request backlog, memory, and process counts appear alongside node, database, and business metrics—usually withine Grafana.
Scale pressure. Concurrency limits that were fine at 10 rps break at 1 000 rps—without visibility, the fix becomes guesswork.
Cost optimisation. Right-sizing pods and spotting CPU-spike startups trims 25-40 % spend, but only if you have per-pod worker data.
Incident response time. SREs need symptom, not speculation. Seeing php_fpm_slow_requests_total jump from 0 → 9 is a faster clue than tailing logs.
Audit & capacity planning. Investors and board-level CTOs increasingly ask for data-driven explanations of infra budgets; PHP-FPM is usually the last black box.
“The cheapest node is the one you never provision—observability pays for itself.”
Below is a field-tested framework we use when parachuting into customer clusters. Each path requires zero PHP-code changes but differs in operations overhead.
1. Sidecar Exporter (Gold Standard)
A lightweight Go binary (hipages/php-fpm_exporter) runs next to PHP-FPM in the same pod.
Pros: Per-pod isolation; security boundary stays inside pod; works everywhere.
Cons: One extra container; Helm chart needs edit.
When to choose: You control the Deployment or can patch via Helm + values file.
2. DaemonSet or Shared Exporter (Ops Minimalist)
Run one exporter per node. It scrapes every PHP-FPM socket over localhost TCP or Unix socket.
Pros: No pod mutation; single exporter upgrade path.
Cons: Cross-pod network; harder to map metrics to which workload; Unix-socket mount may require hostPath.
When to choose: Dozens of small legacy PHP pods, no time for sidecar refactor.
3. Nginx/NJS In-Place Transformation (Zero New Workload)
Leverage the existing Nginx ingress container (often already side-carred) to sub-request /status, parse key:value pairs with NJS or Lua, and emit /metrics.
Pros: No extra processes; keeps pod spec untouched.
Cons: Needs Nginx with ngx_http_js_module; maintenance of ~30-line JS.
When to choose: You can tweak nginx.conf but the application image is untouchable.
4. Direct PodMonitor Scrape (Lucky Unicorn)
If the container image owner already enabled pm.status_path = /metrics and emits Prom-formatted text, a simple PodMonitor or ServiceMonitor does the job.
Pros: Zero code or infra change.
Cons: Rare in the wild; falls back to scrape errors if format wrong.
When to choose: Auditing an image you don’t own—test first.
Decision Table
Constraint | Sidecar | DaemonSet | Nginx-transform | Direct |
---|---|---|---|---|
No chart control | ||||
No extra processes | ||||
Per-pod isolation | ||||
Time-to-deploy | 1 hour | 30 min | 30 min | 10 min |
( = partially true)
Scraping /status directly. Raw FastCGI output isn’t Prom-parseable—every scrape returns text format parsing error.
Ignoring pm.status_path scope. The directive is per-pool; if you run multiple pools you need status enabled on each.
Latent Nginx caching. Some configs set proxy_cache on everything, returning stale metrics. Add proxy_no_cache 1.
Over-scraping. 5 s intervals look tempting but waste CPU; 30–60 s is fine for PHP-FPM queue depth.
Label explosion. Don’t append request path or host as labels—cardinality soars and TSDB costs follow.
Faster RCA: Mean time-to-repair drops when you can prove saturation is at the PHP layer, not upstream ALB.
Capacity planning: php_fpm_max_children vs. active_processes plotted over weeks quantifies when to bump replicas.
Cost savings: Downgrade from c6g.large → c6g.medium when metrics show idle workers at peak.
Security posture: Single sidecar avoids giving Prometheus access inside app containers, meeting many compliance checks.
A recent fintech client cut 38 % of compute after switching from node-level (black-box) graphs to per-pod PHP-FPM metrics. The CFO’s reaction: “You’ve just funded Q4 hiring.”
Store pool name as label: pool="www"—makes multi-tenant clusters searchable.
Align scrape interval with decision window: HPA and Alertmanager check every 30 s? Match it.
Alert on saturation trends: 3-point moving average of (active/total) > 0.8 for 5 min beats raw spikes.
Version-pin exporters: Helm‐value image.tag: v2.5.0 prevents surprise flag changes.
Document metric semantics: Add an md doc in /runbook/php-fpm.md—new engineers learn faster.
“Metrics without runbooks are just colourful noise.”
Monitoring PHP-FPM in Kubernetes doesn’t demand a heavy observability stack. Sidecar exporter, node-level DaemonSet, Nginx transform, or—if luck permits—direct scrape: pick the trade-off that aligns with your operational constraints and governance model.
Ready to extinguish 2 a.m. mysteries and shrink your AWS bill? Start by enabling pm.status_path, deploy one of the translators above, and watch Grafana light up.
Frequently Asked Questions
A1. Less than 10 MiB RAM and ~1 % CPU per pod under normal scrape intervals.
A2. Yes—static Go binaries work in scratch or alpine without glibc.
A3. The exporter supports unix:///run/php/php-fpm.sock with a query string for status.
A4. Inside the cluster, HTTP is typical; use mTLS ingress only if your compliance policy demands.
A5. Only aggregated counters—not stack traces—are revealed, but still restrict access to cluster-internal IPs.