This checklist is to be filled out by developers when setting up a new service. Provide all required details and confirm that setup steps are complete.
Item | Priority | Description | Example | Value | |
Service name | High | Unique name for the service |
| ||
Source repository URL | High | Git repository where the service code is hosted |
| ||
Start/build command | High | Command to build or start the service |
| ||
Environment variables required | High | List all ENV keys needed by the service |
| ||
Secrets required | High | Secrets to be stored securely in Vault/Secrets Manager |
| ||
Expected service ports | Medium | Ports the service listens on |
| ||
Owner/team | High | Team or person responsible for the service |
| ||
Documentation link | Medium | Link to service documentation in Confluence or GitHub Wiki |
| ||
Feature flags/toggles included | Medium | Are feature flags needed? |
| ✅ / ❌ | |
Default configs documented | Medium | Base configs per environment documented |
| ✅ / ❌ | |
Graceful shutdown | High | Does linkerd cover this? |
Item | Priority | Description | Example | Value | |
Dockerfile exists | High | Confirm a working Dockerfile exists and is optimized |
| ✅ / ❌ | |
K8s health checks implemented | High | Startup, readiness, and liveness probes configured |
| ✅ / ❌ | |
Resource requests & limits | High | CPU/memory settings for k8s pods |
| ||
Ingress required? | High | Does service need external access? |
| Yes / No | |
Ingress hostname/path | High | Hostname and path for ingress |
| ||
Deployment strategy | Medium | Deployment rollout type |
| ||
Autoscaling configured | Medium | HPA/VPA set up for scaling pods |
| Yes / No | |
Autoscaling configuration is configured based on application behaviour | Medium | CPU/Memory based or Queue Content based | |||
ServiceAccount / RBAC configured | High | ServiceAccount and RBAC with least privilege |
| ✅ / ❌ | |
Pod disruption budgets | Medium | Ensures minimal service downtime during node upgrades |
| ✅ / ❌ |
Item | Priority | Description | Example | Value | |
APM integration required? | High | Should service have tracing/profiling? |
| Yes / No | |
Prometheus metrics exposed | High | Exposes /metrics endpoint with engine and app-specific metrics |
| ✅ / ❌ | |
Key KPIs to monitor | High | Define KPIs for health and performance |
| ||
Dashboards required? | Medium | Should Grafana dashboards be created? |
| ✅ / ❌ | |
Alerts configured | High | Alerting for key metrics in place |
| ✅ / ❌ | |
Log format standardized | Medium | JSON logs with correlation IDs |
| ✅ / ❌ | |
External synthetic health checks | Medium | Uptime monitoring from user perspective |
| ✅ / ❌ | |
Audit logs implemented | Medium | Logs security-sensitive actions |
| ✅ / ❌ | |
Centralized logging setup | High | Logs shipped to centralized system (Loki, CloudWatch, ELK) |
| ✅ / ❌ | |
Metric baselines documented | Medium | Define normal ranges for key metrics |
| ✅ / ❌ | |
Sentry | High | APM is instrumented | ✅ / ❌ |
Item | Priority | Description | Example | Value | |
Secrets stored securely | High | All secrets stored in Secrets Manager/Vault |
| ✅ / ❌ | |
TLS/HTTPS enforced | High | HTTPS configured for all external endpoints |
| ✅ / ❌ | |
API authentication method | High | Auth mechanism for API access |
| ||
Vulnerability scanning enabled | High | Vulnerability scanning integrated in pipeline |
| ✅ / ❌ | |
Dependency scanning configured | Medium | Dependency scanning for known CVEs |
| ✅ / ❌ | |
RBAC for API endpoints | High | Authorization implemented |
| ✅ / ❌ | |
Image signing & verification | Medium | Container images are signed and verified |
| ✅ / ❌ | |
Rate limiting implemented | Medium | Protect APIs from abuse and DoS attacks |
| ✅ / ❌ | |
Data encryption configured | High | Data encrypted at rest and in transit |
| ✅ / ❌ | |
Penetration testing planned | Medium | Security testing included for critical endpoints |
| ✅ / ❌ |
Item | Priority | Description | Example | Value | |
Ownership assigned | High | Service owner/contact documented |
| ||
Client team trained | Medium | Training provided to client team |
| ✅ / ❌ | |
Documentation updated | High | Docs available in Confluence/CloudBrowser |
| ✅ / ❌ | |
Runbook created | High | Operational runbook for on-call teams |
| ✅ / ❌ | |
Monitoring alerts tested | High | Simulated alerts to ensure delivery to on-call |
| ✅ / ❌ | |
Cost tagging configured | Medium | Tags applied for cost allocation |
| ✅ / ❌ | |
Backup procedures documented | High | Backup strategy for DBs/configs documented |
| ✅ / ❌ | |
Support escalation path defined | Medium | Documented escalation contacts |
| ✅ / ❌ | |
Post-deployment validation | High | Service tested and validated post-deploy |
| ✅ / ❌ | |
Old unused configs removed | Low | Clean up unused configs/artifacts |
| ✅ / ❌ |
…