Deploy Checkly Private Locations on Kubernetes for production-scale monitoring with built-in orchestration, scaling, and reliability features. This guide covers Helm chart deployment, manual manifests, and production best practices.
Prerequisites
Before deploying on Kubernetes, ensure you have:
Running Kubernetes cluster (v1.19+)
kubectl configured and connected
Sufficient cluster resources
Network policies configured (if required)
Quick Start with Helm
The fastest way to deploy Checkly Agents on Kubernetes is using our official Helm chart.
Get the Helm Chart
# Clone the official repository
git clone https://github.com/checkly/checkly-k8s.git
cd checkly-k8s
# Verify chart structure
ls -la helm-chart/
Deploy with Helm
# Install with your API key
helm install checkly-agent \
--set apiKeySecret.apiKey="pl_your_api_key_here" \
./helm-chart
# Install with custom values
helm install checkly-agent \
--set apiKeySecret.apiKey="pl_your_api_key_here" \
--set replicaCount= 3 \
--set resources.requests.memory="2Gi" \
--set resources.requests.cpu="1" \
./helm-chart
# Install in custom namespace
kubectl create namespace checkly
helm install checkly-agent \
--namespace checkly \
--set apiKeySecret.apiKey="pl_your_api_key_here" \
./helm-chart
Verify Deployment
# Check pod status
kubectl get pods -l app.kubernetes.io/name=checkly-agent
# View agent logs
kubectl logs -l app.kubernetes.io/name=checkly-agent --tail=50
# Check agent connectivity
kubectl exec -it deployment/checkly-agent -- curl -I https://agent.checklyhq.com
The Helm chart automatically creates API key secrets, configures resource limits, and sets up pod security policies for production use.
Manual Kubernetes Manifests
For more control or custom environments, deploy using individual Kubernetes manifests.
1. Create Namespace (Optional)
# checkly-namespace.yaml
apiVersion : v1
kind : Namespace
metadata :
name : checkly
labels :
name : checkly
purpose : monitoring
kubectl apply -f checkly-namespace.yaml
2. Create API Key Secret
# api-key-secret.yaml
apiVersion : v1
kind : Secret
metadata :
name : checkly-api-key
namespace : checkly
type : Opaque
data :
# Base64 encoded API key: echo -n "pl_your_api_key" | base64
api-key : cGxfeW91cl9hcGlfa2V5X2hlcmU=
# Create secret directly (easier method)
kubectl create secret generic checkly-api-key \
--namespace=checkly \
--from-literal=api-key= "pl_your_api_key_here"
3. Create Deployment
# checkly-deployment.yaml
apiVersion : apps/v1
kind : Deployment
metadata :
name : checkly-agent
namespace : checkly
labels :
app : checkly-agent
version : "6.0.3"
spec :
replicas : 2
selector :
matchLabels :
app : checkly-agent
template :
metadata :
labels :
app : checkly-agent
annotations :
prometheus.io/scrape : "false"
spec :
containers :
- name : checkly-agent
image : checkly/agent:6.0.3
imagePullPolicy : IfNotPresent
env :
- name : API_KEY
valueFrom :
secretKeyRef :
name : checkly-api-key
key : api-key
- name : JOB_CONCURRENCY
value : "3"
- name : LOG_LEVEL
value : "INFO"
resources :
requests :
memory : "1Gi"
cpu : "500m"
limits :
memory : "4Gi"
cpu : "2"
livenessProbe :
exec :
command :
- /bin/sh
- -c
- "ps aux | grep -v grep | grep node"
initialDelaySeconds : 30
periodSeconds : 30
readinessProbe :
exec :
command :
- /bin/sh
- -c
- "ps aux | grep -v grep | grep node"
initialDelaySeconds : 10
periodSeconds : 10
securityContext :
allowPrivilegeEscalation : false
runAsNonRoot : true
runAsUser : 1000
runAsGroup : 1000
readOnlyRootFilesystem : false
capabilities :
drop :
- ALL
restartPolicy : Always
terminationGracePeriodSeconds : 30
securityContext :
fsGroup : 1000
kubectl apply -f checkly-deployment.yaml
Production Configuration
Resource Planning
Configure appropriate resource requests and limits based on your workload:
# High-capacity agent configuration
resources :
requests :
memory : "2Gi"
cpu : "1"
limits :
memory : "8Gi"
cpu : "4"
env :
- name : JOB_CONCURRENCY
value : "8" # 8GB / 1GB per browser check
# API-only agent configuration
resources :
requests :
memory : "500Mi"
cpu : "250m"
limits :
memory : "2Gi"
cpu : "1"
env :
- name : JOB_CONCURRENCY
value : "10" # 2GB / 0.15GB per API check
Node Affinity and Scheduling
# Pin agents to specific nodes
spec :
template :
spec :
affinity :
nodeAffinity :
requiredDuringSchedulingIgnoredDuringExecution :
nodeSelectorTerms :
- matchExpressions :
- key : checkly.io/agent-node
operator : In
values :
- "true"
podAntiAffinity :
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 100
podAffinityTerm :
labelSelector :
matchExpressions :
- key : app
operator : In
values :
- checkly-agent
topologyKey : kubernetes.io/hostname
tolerations :
- key : "checkly.io/dedicated"
operator : "Equal"
value : "agent"
effect : "NoSchedule"
Pod Disruption Budget
# pod-disruption-budget.yaml
apiVersion : policy/v1
kind : PodDisruptionBudget
metadata :
name : checkly-agent-pdb
namespace : checkly
spec :
minAvailable : 1
selector :
matchLabels :
app : checkly-agent
Horizontal Pod Autoscaler
# hpa.yaml
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : checkly-agent-hpa
namespace : checkly
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : checkly-agent
minReplicas : 2
maxReplicas : 10
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 70
- type : Resource
resource :
name : memory
target :
type : Utilization
averageUtilization : 80
behavior :
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Percent
value : 10
periodSeconds : 60
scaleUp :
stabilizationWindowSeconds : 60
policies :
- type : Percent
value : 50
periodSeconds : 60
Advanced Configuration
ConfigMap for Agent Settings
# agent-config.yaml
apiVersion : v1
kind : ConfigMap
metadata :
name : checkly-agent-config
namespace : checkly
data :
JOB_CONCURRENCY : "5"
LOG_LEVEL : "INFO"
USE_OS_DNS_RESOLVER : "true"
---
apiVersion : apps/v1
kind : Deployment
metadata :
name : checkly-agent
spec :
template :
spec :
containers :
- name : checkly-agent
envFrom :
- configMapRef :
name : checkly-agent-config
env :
- name : API_KEY
valueFrom :
secretKeyRef :
name : checkly-api-key
key : api-key
Network Policies
# network-policy.yaml
apiVersion : networking.k8s.io/v1
kind : NetworkPolicy
metadata :
name : checkly-agent-netpol
namespace : checkly
spec :
podSelector :
matchLabels :
app : checkly-agent
policyTypes :
- Egress
egress :
# Allow DNS resolution
- to : []
ports :
- protocol : UDP
port : 53
# Allow HTTPS to Checkly API
- to : []
ports :
- protocol : TCP
port : 443
# Allow HTTP/HTTPS for checks
- to : []
ports :
- protocol : TCP
port : 80
- protocol : TCP
port : 443
Monitoring and Observability
# service-monitor.yaml (for Prometheus)
apiVersion : monitoring.coreos.com/v1
kind : ServiceMonitor
metadata :
name : checkly-agent
namespace : checkly
spec :
selector :
matchLabels :
app : checkly-agent
endpoints :
- port : metrics
interval : 30s
path : /metrics
---
apiVersion : v1
kind : Service
metadata :
name : checkly-agent-metrics
namespace : checkly
labels :
app : checkly-agent
spec :
ports :
- name : metrics
port : 9090
targetPort : 9090
selector :
app : checkly-agent
Helm Chart Customization
Custom Values File
# values-production.yaml
replicaCount : 3
image :
repository : checkly/agent
tag : "6.0.3"
pullPolicy : IfNotPresent
apiKeySecret :
apiKey : "pl_your_production_key"
env :
JOB_CONCURRENCY : "5"
LOG_LEVEL : "INFO"
USE_OS_DNS_RESOLVER : "true"
resources :
requests :
memory : "2Gi"
cpu : "1"
limits :
memory : "6Gi"
cpu : "3"
nodeSelector :
checkly.io/agent-node : "true"
tolerations :
- key : "checkly.io/dedicated"
operator : "Equal"
value : "agent"
effect : "NoSchedule"
affinity :
podAntiAffinity :
preferredDuringSchedulingIgnoredDuringExecution :
- weight : 100
podAffinityTerm :
labelSelector :
matchExpressions :
- key : app.kubernetes.io/name
operator : In
values :
- checkly-agent
topologyKey : kubernetes.io/hostname
podDisruptionBudget :
enabled : true
minAvailable : 1
monitoring :
enabled : true
serviceMonitor :
enabled : true
Deploy with custom values:
helm install checkly-agent -f values-production.yaml ./helm-chart
Upgrade Strategy
# Upgrade with rolling deployment
helm upgrade checkly-agent \
--set image.tag="6.1.0" \
--reuse-values \
./helm-chart
# Rollback if needed
helm rollback checkly-agent 1
Security Best Practices
Pod Security Standards
# pod-security-policy.yaml
apiVersion : v1
kind : Pod
metadata :
name : checkly-agent
spec :
securityContext :
runAsNonRoot : true
runAsUser : 1000
runAsGroup : 1000
fsGroup : 1000
seccompProfile :
type : RuntimeDefault
containers :
- name : checkly-agent
securityContext :
allowPrivilegeEscalation : false
readOnlyRootFilesystem : false
runAsNonRoot : true
runAsUser : 1000
runAsGroup : 1000
capabilities :
drop :
- ALL
Secret Management with External Secrets
# external-secret.yaml (using External Secrets Operator)
apiVersion : external-secrets.io/v1beta1
kind : ExternalSecret
metadata :
name : checkly-api-key-external
namespace : checkly
spec :
refreshInterval : 1h
secretStoreRef :
name : vault-secret-store
kind : SecretStore
target :
name : checkly-api-key
creationPolicy : Owner
data :
- secretKey : api-key
remoteRef :
key : secret/checkly
property : api-key
Troubleshooting
Common issues and solutions: # Check pod status and events
kubectl describe pod -l app=checkly-agent -n checkly
# Check resource constraints
kubectl top pods -n checkly
# Verify secret exists
kubectl get secret checkly-api-key -n checkly -o yaml
# Check image pull issues
kubectl get events -n checkly --sort-by=.metadata.creationTimestamp
Common solutions:
Verify API key is correctly base64 encoded in secret
Check resource requests vs available cluster capacity
Ensure image tag exists and is accessible
Verify namespace exists and has proper RBAC
Agent Connectivity Issues
Diagnostic commands: # Test connectivity from pod
kubectl exec -it deployment/checkly-agent -n checkly -- \
curl -I https://agent.checklyhq.com
# Check DNS resolution
kubectl exec -it deployment/checkly-agent -n checkly -- \
nslookup agent.checklyhq.com
# View agent logs
kubectl logs deployment/checkly-agent -n checkly --tail=100 -f
# Check network policies
kubectl describe networkpolicy -n checkly
Common solutions:
Verify egress network policies allow HTTPS to agent.checklyhq.com
Check if corporate firewall blocks outbound connections
Ensure DNS resolution works for external domains
Verify proxy configuration if required
Monitoring and Alerting
Prometheus Metrics
Monitor Checkly Agents with custom metrics:
# prometheus-rules.yaml
apiVersion : monitoring.coreos.com/v1
kind : PrometheusRule
metadata :
name : checkly-agent-rules
namespace : checkly
spec :
groups :
- name : checkly-agent
rules :
- alert : ChecklyAgentDown
expr : up{job="checkly-agent"} == 0
for : 5m
labels :
severity : critical
annotations :
summary : "Checkly Agent is down"
description : "Checkly Agent {{ $labels.instance }} has been down for more than 5 minutes"
- alert : ChecklyAgentHighMemory
expr : container_memory_usage_bytes{pod=~"checkly-agent-.*"} / container_spec_memory_limit_bytes{pod=~"checkly-agent-.*"} > 0.9
for : 10m
labels :
severity : warning
annotations :
summary : "Checkly Agent high memory usage"
description : "Checkly Agent {{ $labels.pod }} memory usage is above 90%"
Logging Configuration
# fluent-bit-config.yaml
apiVersion : v1
kind : ConfigMap
metadata :
name : fluent-bit-config
namespace : checkly
data :
fluent-bit.conf : |
[INPUT]
Name tail
Path /var/log/containers/checkly-agent-*.log
Parser docker
Tag checkly.agent.*
Refresh_Interval 5
[OUTPUT]
Name forward
Match checkly.agent.*
Host log-aggregator.company.com
Port 24224
Next Steps