Monitor Argo CD ด้วย Prometheus + Grafana Dashboard บน Cloud VPS

การจัดการและติดตามสถานะของ Argo CD เป็นสิ่งสำคัญในการรักษาเสถียรภาพของ Continuous Delivery Pipeline บทความนี้จะแนะนำวิธีการติดตามประสิทธิภาพของ Argo CD โดยใช้ Prometheus และ Grafana Dashboard บน Cloud VPS ของ ผู้ให้บริการโฮสติ้ง

ความสำคัญของการ Monitor Argo CD

Argo CD เป็นเครื่องมือ GitOps ที่มีความสำคัญต่อการจัดการ Kubernetes Deployments ได้อย่างอัตโนมัติ การติดตามสถานะและประสิทธิภาพของ Argo CD ช่วยให้คุณ:

ตรวจจับปัญหาในกระบวนการ Sync และ Deployment ได้เร็ว
ลดเวลา Downtime ของแอปพลิเคชัน
ทำความเข้าใจถึงอัตราการเปลี่ยนแปลงและความถี่ของ Deployment
วัดประสิทธิภาพของ GitOps Pipeline
จัดการทรัพยากร Kubernetes ได้อย่างมีประสิทธิภาพ

สำหรับผู้ใช้งาน ผู้ให้บริการโฮสติ้ง Cloud VPS คุณสามารถติดตั้ง Prometheus และ Grafana บนเซิร์ฟเวอร์เดียวกันที่ Argo CD ทำงานอยู่ได้อย่างสะดวก

การติดตั้ง Prometheus สำหรับ Argo CD

ขั้นแรกให้ติดตั้ง Prometheus Operator โดยใช้ Helm Chart ที่มีความเข้ากันได้กับ Argo CD:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

หลังจากติดตั้งเสร็จสิ้น ให้ตรวจสอบว่า Pod ของ Prometheus และ Grafana ทำงานได้ปกติ:

kubectl get pods -n monitoring

kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090

การกำหนดค่า ServiceMonitor สำหรับ Argo CD

สร้าง ServiceMonitor เพื่อให้ Prometheus สามารถเก็บ Metrics จาก Argo CD Metrics Server ได้:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-metrics
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-metrics
  endpoints:
  - port: metrics
    interval: 30s

นอกจากนี้ สร้าง ServiceMonitor สำหรับ Argo CD Server Metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-server-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server-metrics
  endpoints:
  - port: metrics
    interval: 30s

Metrics ที่สำคัญในการติดตาม Argo CD

ต่อไปนี้เป็น Metrics หลักที่ควรติดตามเพื่อให้มั่นใจว่า Argo CD ทำงานได้ปกติ:

argocd_app_info – ข้อมูลเกี่ยวกับแต่ละ Application ในคลัสเตอร์
argocd_app_sync_total – จำนวนครั้งที่ Application ทำการ Sync
argocd_app_sync_duration_seconds – ระยะเวลาที่ใช้ในการ Sync Application
argocd_app_health_status – สถานะความสุขภาพของ Application
argocd_server_authentication_attempts – จำนวนครั้งที่มีการพยายาม Authentication
argocd_git_request_duration_seconds – ระยะเวลาในการเชื่อมต่อ Git Repository
argocd_controller_reconcile_bucket – ระยะเวลาในการ Reconcile Applications

การตั้งค่า Grafana Dashboard

เข้าสู่ Grafana Dashboard ผ่าน Port Forwarding:

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

เปิด URL http://localhost:3000 และ Login ด้วย Default Credentials (admin/prom-operator)

สร้าง Dashboard ใหม่และเพิ่ม Panels ต่อไปนี้:

Panel 1: Application Sync Status

sum(rate(argocd_app_sync_total[5m])) by (dest_server)

Panel 2: Average Sync Duration

histogram_quantile(0.95, rate(argocd_app_sync_duration_seconds_bucket[5m]))

Panel 3: Application Health

count(argocd_app_health_status{health_status="Healthy"}) / count(argocd_app_info)

การตั้งค่า Alerting Rules

สร้าง PrometheusRule เพื่อกำหนด Alert Rules สำหรับเหตุการณ์ที่สำคัญ:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
  namespace: monitoring
spec:
  groups:
  - name: argocd.rules
    interval: 30s
    rules:
    - alert: ArgoCDSyncFailure
      expr: increase(argocd_app_sync_total{phase="Failed"}[5m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Argo CD Sync Failure Detected"

    - alert: ArgoCDAppUnhealthy
      expr: argocd_app_health_status{health_status!="Healthy"} > 0
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Argo CD Application Unhealthy"

Dashboard Examples และการใช้งาน

ตัวอย่าง Dashboard ที่สมบูรณ์ประกอบด้วย:

Real-time Sync Status Chart
Application Health Percentage Gauge
Sync Duration Histogram
Git Request Latency Graph
Controller Reconciliation Rate
API Server Request Rate

คุณสามารถนำเข้า Argo CD Dashboard ที่มีอยู่แล้วจาก Grafana Community:

https://grafana.com/grafana/dashboards/14584

Best Practices สำหรับการ Monitor Argo CD

ตั้งค่า Retention Policy ของ Prometheus ให้เหมาะสม (ค่าปกติคือ 15 วัน)
ใช้ Multiple Replicas ของ Prometheus สำหรับความเสถียรสูง
กำหนด Alert ที่เหมาะสมและหลีกเลี่ยง Alert Fatigue
ทำการ Backup Configuration ของ Grafana Dashboard อย่างสม่ำเสมอ
ใช้ Service Account ที่มีสิทธิ์ที่เหมาะสมสำหรับการเข้าถึง Metrics

การปรับปรุง Performance ของ Monitoring Stack

เพื่อให้ Monitoring Stack ทำงานได้ด้วยประสิทธิภาพที่สูงสุด บน Cloud VPS ของ ผู้ให้บริการโฮสติ้ง สามารถ:

เพิ่ม CPU และ Memory สำหรับ Prometheus Pod
ใช้ Persistent Volume สำหรับการเก็บ Metrics Data
กำหนด Resource Requests และ Limits ให้เหมาะสม
ใช้ Remote Storage สำหรับ Prometheus หากมีปริมาณ Metrics มากขึ้น

บทสรุป

การติดตามประสิทธิภาพของ Argo CD ด้วย Prometheus และ Grafana เป็นสิ่งที่จำเป็นสำหรับการจัดการ Kubernetes Cluster อย่างมีประสิทธิภาพ ด้วยการตั้งค่าที่ถูกต้อง คุณจะสามารถ:

ตรวจจับและแก้ไขปัญหาได้อย่างรวดเร็ว
เข้าใจถึงพฤติกรรมของ GitOps Pipeline
ปรับปรุงประสิทธิภาพของ Deployment
รักษาความเสถียรของระบบ

สำหรับผู้ใช้งาน ผู้ให้บริการโฮสติ้ง Cloud VPS สามารถติดตั้ง Monitoring Stack นี้บนเซิร์ฟเวอร์ Cloud VPS ได้อย่างสะดวกและประหยัด โปรดเยี่ยมชม Cloud VPS เพื่อเรียนรู้เพิ่มเติมเกี่ยวกับการติดตั้งบน Kubernetes Cluster