🚀 New Service Setup Checklist & Form

This checklist is to be filled out by developers when setting up a new service. Provide all required details and confirm that setup steps are complete.

📦 Service Basics

Item

Priority

Description

Example

Value

Service name

High

Unique name for the service

payment-service

Source repository URL

High

Git repository where the service code is hosted

https://github.com/org/payment-service

Start/build command

High

Command to build or start the service

npm startjava -jar app.jar

Environment variables required

High

List all ENV keys needed by the service

DB_HOST, API_KEY, LOG_LEVEL

Secrets required

High

Secrets to be stored securely in Vault/Secrets Manager

DB_PASSWORDJWT_SECRET

Expected service ports

Medium

Ports the service listens on

8080

Owner/team

High

Team or person responsible for the service

Payments Teamdevops@company.com

Documentation link

Medium

Link to service documentation in Confluence or GitHub Wiki

https://confluence.org/payment-service

Feature flags/toggles included

Medium

Are feature flags needed?

enable-beta-feature

✅ / ❌

Default configs documented

Medium

Base configs per environment documented

config/staging.yaml

✅ / ❌

Graceful shutdown

High

Does linkerd cover this?

☁️ Cloud & Deployment

Item

Priority

Description

Example

Value

Dockerfile exists

High

Confirm a working Dockerfile exists and is optimized

FROM alpine:3.18

✅ / ❌

K8s health checks implemented

High

Startup, readiness, and liveness probes configured

/healthz returns HTTP 200

✅ / ❌

Resource requests & limits

High

CPU/memory settings for k8s pods

requests: cpu:500m, memory:256Mi

Ingress required?

High

Does service need external access?

Yes

Yes / No

Ingress hostname/path

High

Hostname and path for ingress

api.company.com/payment

Deployment strategy

Medium

Deployment rollout type

RollingUpdateCanary

Autoscaling configured

Medium

HPA/VPA set up for scaling pods

HPA: min=2, max=5 pods

Yes / No

Autoscaling configuration is configured based on application behaviour

Medium

CPU/Memory based or Queue Content based

ServiceAccount / RBAC configured

High

ServiceAccount and RBAC with least privilege

payment-service-sa

✅ / ❌

Pod disruption budgets

Medium

Ensures minimal service downtime during node upgrades

minAvailable: 1

✅ / ❌

📊 Observability

Item

Priority

Description

Example

Value

APM integration required?

High

Should service have tracing/profiling?

Tempo tracing enabled

Yes / No

Prometheus metrics exposed

High

Exposes /metrics endpoint with engine and app-specific metrics

http_requests_totalqueue_depth

✅ / ❌

Key KPIs to monitor

High

Define KPIs for health and performance

Error rate <1%, latency p95 <500ms

Dashboards required?

Medium

Should Grafana dashboards be created?

Grafana: payment-service-dashboard

✅ / ❌

Alerts configured

High

Alerting for key metrics in place

500 errors >10/min triggers PagerDuty

✅ / ❌

Log format standardized

Medium

JSON logs with correlation IDs

{"request_id":"abc-123", "message":"OK"}

✅ / ❌

External synthetic health checks

Medium

Uptime monitoring from user perspective

Pingdom health check enabled

✅ / ❌

Audit logs implemented

Medium

Logs security-sensitive actions

User X deleted resource Y

✅ / ❌

Centralized logging setup

High

Logs shipped to centralized system (Loki, CloudWatch, ELK)

JSON logs to CloudWatch

✅ / ❌

Metric baselines documented

Medium

Define normal ranges for key metrics

Latency p95 < 300ms in normal load

✅ / ❌

Sentry

High

APM is instrumented

✅ / ❌

🔒 Security

Item

Priority

Description

Example

Value

Secrets stored securely

High

All secrets stored in Secrets Manager/Vault

AWS Secrets Manager

✅ / ❌

TLS/HTTPS enforced

High

HTTPS configured for all external endpoints

cert-manager auto-renewal enabled

✅ / ❌

API authentication method

High

Auth mechanism for API access

OAuth2JWTAPI keys

Vulnerability scanning enabled

High

Vulnerability scanning integrated in pipeline

Trivy scan as GitLab CI stage

✅ / ❌

Dependency scanning configured

Medium

Dependency scanning for known CVEs

Dependabot alerts enabled

✅ / ❌

RBAC for API endpoints

High

Authorization implemented

admin role can delete users

✅ / ❌

Image signing & verification

Medium

Container images are signed and verified

cosign signed images

✅ / ❌

Rate limiting implemented

Medium

Protect APIs from abuse and DoS attacks

10 req/s per IP

✅ / ❌

Data encryption configured

High

Data encrypted at rest and in transit

RDS encryption enabled

✅ / ❌

Penetration testing planned

Medium

Security testing included for critical endpoints

Scheduled Q4

✅ / ❌

🏁 Handoff & Operations

Item

Priority

Description

Example

Value

Ownership assigned

High

Service owner/contact documented

devops@company.com#payments-team

Client team trained

Medium

Training provided to client team

1h walkthrough recorded in Confluence

✅ / ❌

Documentation updated

High

Docs available in Confluence/CloudBrowser

Confluence page created: Service Overview

✅ / ❌

Runbook created

High

Operational runbook for on-call teams

payment-service-runbook.md

✅ / ❌

Monitoring alerts tested

High

Simulated alerts to ensure delivery to on-call

PagerDuty test succeeded

✅ / ❌

Cost tagging configured

Medium

Tags applied for cost allocation

Team:Payments, Env:Prod

✅ / ❌

Backup procedures documented

High

Backup strategy for DBs/configs documented

Daily RDS snapshots configured

✅ / ❌

Support escalation path defined

Medium

Documented escalation contacts

L1: DevOps, L2: SRE Team

✅ / ❌

Post-deployment validation

High

Service tested and validated post-deploy

Smoke tests passed

✅ / ❌

Old unused configs removed

Low

Clean up unused configs/artifacts

Removed test values from ConfigMap

✅ / ❌