Container Security Monitoring: Runtime Security ด้วย Falco

Runtime Security เป็นการตรวจจับพฤติกรรมผิดปกติของ workload ที่กำลังทำงานอยู่ — ต่างจาก vulnerability scanning ที่ตรวจก่อน deploy เพราะ Runtime Security จับได้เฉพาะสิ่งที่กำลังเกิดจริง ๆ เช่น shell ที่ spawn ใน pod, การเขียน file ลง /etc, หรือการอ่าน secret ที่ไม่ควรเข้าถึง ซึ่งเป็น signal สำคัญของ container escape, cryptojacking และ supply chain attack

Falco เป็น CNCF Graduated project ที่ออกแบบมาเพื่อทำ Runtime Security สำหรับ Cloud Native workload โดยเฉพาะ ทำงานผ่าน eBPF หรือ Kernel Module driver เพื่อ capture syscall ทุกตัวที่เกิดขึ้นบนเครื่อง แล้วเทียบกับ rule ที่กำหนดไว้เพื่อสร้าง alert ให้ทีม security ได้ทันที — บทความนี้จะเจาะลึกตั้งแต่ architecture, การติดตั้งบน Kubernetes, การเขียน custom rules, จนถึงการเชื่อมต่อกับระบบ alert และ response automation

Container Runtime Security คืออะไร

Runtime Security คือการ monitor พฤติกรรมของ workload ที่กำลังทำงานอยู่ และตรวจจับ activity ที่ผิดปกติหรือเข้าข่ายเป็นภัยคุกคาม ต่างจาก static scanning ที่ตรวจ image ก่อน deploy — เพราะการสแกน image เจอเฉพาะ known CVE แต่ไม่สามารถจับ zero-day exploit, supply chain compromise, หรือ malicious behavior ที่เกิดหลัง container run

Threat ที่ Runtime Security จับได้

Container Escape — พยายาม break out จาก container ไปยัง host เช่น mount /proc, เขียน /etc/shadow ของ host
Privilege Escalation — เปลี่ยนจาก non-root ไปเป็น root, ใช้ setuid binary ที่ไม่ควรใช้
Cryptojacking — workload ที่ถูก hijack ไปขุดเหรียญ เช่น การ spawn process ชื่อ xmrig
Reverse Shell — workload ที่เปิด connection กลับไปยัง C2 server
Sensitive File Access — อ่าน /etc/shadow, /root/.ssh/id_rsa, Kubernetes service account token
Unexpected Network Activity — workload ที่ไม่ควรมี network activity แต่เริ่มเปิด connection ออก internet
Supply Chain Attack — package ที่ถูก compromise spawn process ที่ไม่คาดคิด

Falco Architecture

ระบบนี้ทำงานโดย capture syscall จาก kernel แล้วส่งเข้า rule engine เพื่อตัดสินว่าเป็น event ที่ต้องแจ้งเตือนหรือไม่ — component หลักประกอบด้วย:

Driver — เก็บ syscall จาก kernel สามารถเลือกใช้ eBPF (modern, recommended), Kernel Module (legacy), หรือ Modern eBPF (kernel 5.8+)
Libs (libsinsp/libscap) — แปลง raw syscall ให้เป็น event object ที่มี context เช่น container ID, image name, user
Rules Engine — ตรวจสอบ event กับ rule ที่เขียนในรูปแบบ YAML
Outputs — ส่ง alert ไปยัง stdout, syslog, file, HTTP endpoint, gRPC หรือผ่าน Falcosidekick ไปยัง output อื่น ๆ
Plugins — ขยายการตรวจจับไปยัง event source อื่น เช่น Kubernetes audit logs, AWS CloudTrail, GitHub

เปรียบเทียบ Driver

Driver	Kernel Requirement	Performance	Deployment
Kernel Module	2.6+	ดีที่สุด	ต้อง compile ตาม kernel
eBPF (legacy)	4.14+	ดี	ต้องมี BPF program
Modern eBPF	5.8+ (CO-RE)	ดีที่สุด (recommended)	Deploy ง่าย ไม่ต้อง compile

ติดตั้ง Falco บน Kubernetes

วิธีที่แนะนำคือการติดตั้งผ่าน Helm Chart โดย deploy เป็น DaemonSet เพื่อให้มี agent ทุก Node — ตัวอย่างการติดตั้งแบบใช้ Modern eBPF driver:

# เพิ่ม Falco Helm repo
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

# สร้าง namespace
kubectl create namespace falco

# ติดตั้ง Falco พร้อม Falcosidekick
helm install falco falcosecurity/falco \
  --namespace falco \
  --set driver.kind=modern_ebpf \
  --set tty=true \
  --set falcosidekick.enabled=true \
  --set falcosidekick.webui.enabled=true

ตรวจสอบว่าระบบทำงานปกติทุก Node:

# ตรวจ DaemonSet
kubectl get ds -n falco

# ดู log จาก Falco pod
kubectl logs -n falco -l app.kubernetes.io/name=falco -f

# ตรวจว่า driver load สำเร็จ
kubectl logs -n falco -l app.kubernetes.io/name=falco | grep -i "driver loaded"

Values ที่สำคัญสำหรับ Production

# values.yaml
driver:
  kind: modern_ebpf

collectors:
  kubernetes:
    enabled: true  # ดึง metadata จาก K8s API (pod name, namespace, labels)

falco:
  json_output: true          # JSON format สำหรับ parse ง่าย
  log_level: info
  priority: notice           # ระดับต่ำสุดที่ต้องการ log
  buffered_outputs: false
  http_output:
    enabled: true
    url: http://falcosidekick:2801

falcosidekick:
  enabled: true
  config:
    slack:
      webhookurl: https://hooks.slack.com/services/XXX/YYY/ZZZ
      minimumpriority: warning

resources:
  requests:
    cpu: 100m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1024Mi

Falco Rules พื้นฐาน

Rule เขียนเป็น YAML มี 3 องค์ประกอบหลัก — rule (ชื่อ), condition (เงื่อนไขที่ต้องตรงทั้งหมด), และ output (ข้อความที่แสดงเมื่อ match) โดยใช้ field จาก syscall event เช่น proc.name, container.id, fd.name

ตัวอย่าง Rule ที่มาในตัว

- rule: Terminal shell in container
  desc: A shell was used as the entrypoint/exec point into a container
  condition: >
    spawned_process and container
    and shell_procs and proc.tty != 0
    and container_entrypoint
    and not user_expected_terminal_shell_in_container_conditions
  output: >
    A shell was spawned in a container with an attached terminal
    (user=%user.name user_loginuid=%user.loginuid %container.info
    shell=%proc.name parent=%proc.pname cmdline=%proc.cmdline
    terminal=%proc.tty container_id=%container.id image=%container.image.repository)
  priority: NOTICE
  tags: [container, shell, mitre_execution]

Macros และ Lists

Rule engine รองรับ macro (เงื่อนไขที่ reuse ได้) และ list (กลุ่มของค่า) เพื่อให้ rule อ่านง่ายขึ้น:

- list: sensitive_file_names
  items: [/etc/shadow, /etc/sudoers, /root/.ssh/id_rsa, /var/run/secrets/kubernetes.io/serviceaccount/token]

- macro: sensitive_file_read
  condition: (open_read and fd.name in (sensitive_file_names))

- rule: Read sensitive file untrusted
  desc: An attempt to read sensitive file by a non-trusted program
  condition: >
    sensitive_file_read
    and not proc.name in (trusted_programs)
  output: >
    Sensitive file opened for reading by non-trusted program
    (file=%fd.name proc=%proc.name container=%container.name)
  priority: WARNING
  tags: [filesystem, mitre_credential_access]

เขียน Custom Rules สำหรับ Workload เฉพาะ

Rule ที่มาในตัว default เป็น rule ทั่วไป — สำหรับการใช้งานจริงควรเขียน custom rule ตาม workload เฉพาะของระบบ เพื่อลด false positive และเพิ่มความแม่นยำ ตัวอย่างต่อไปนี้เป็น custom rule ที่ใช้งานได้จริง:

ตรวจจับ Cryptojacking

- list: crypto_miners
  items: [xmrig, minerd, cgminer, bfgminer, cryptonight, ethminer, t-rex, nbminer]

- rule: Detect crypto miner in container
  desc: Detect known crypto mining binary execution
  condition: >
    spawned_process and container
    and (proc.name in (crypto_miners)
         or proc.cmdline contains "stratum+tcp"
         or proc.cmdline contains "cryptonight")
  output: >
    Crypto miner detected in container
    (image=%container.image.repository pod=%k8s.pod.name
    ns=%k8s.ns.name proc=%proc.name cmd=%proc.cmdline)
  priority: CRITICAL
  tags: [cryptojacking, container]

ตรวจจับ Reverse Shell

- rule: Reverse shell from container
  desc: Detect reverse shell patterns (bash/nc with network redirection)
  condition: >
    spawned_process and container
    and ((proc.name in (nc, ncat, netcat, nmap-ncat)
          and proc.args contains "-e")
         or (proc.name = "bash"
             and proc.cmdline contains "/dev/tcp/")
         or (proc.name = "python" or proc.name = "python3"
             and proc.cmdline contains "socket.socket"
             and proc.cmdline contains "connect"))
  output: >
    Reverse shell attempt detected
    (pod=%k8s.pod.name ns=%k8s.ns.name
    proc=%proc.name cmd=%proc.cmdline user=%user.name)
  priority: CRITICAL
  tags: [network, mitre_command_and_control]

ตรวจจับ Access Kubernetes Service Account Token

- macro: service_account_token_read
  condition: >
    open_read
    and fd.name startswith /var/run/secrets/kubernetes.io/serviceaccount

- list: allowed_sa_readers
  items: [kubelet, kube-proxy, kubectl]

- rule: Suspicious SA token access
  desc: Non-trusted process reading K8s service account token
  condition: >
    service_account_token_read
    and not proc.name in (allowed_sa_readers)
    and not proc.pname in (allowed_sa_readers)
  output: >
    K8s service account token accessed by unexpected process
    (pod=%k8s.pod.name proc=%proc.name file=%fd.name user=%user.name)
  priority: WARNING
  tags: [k8s, credential_access, mitre_credential_access]

Append Rule เพื่อลด False Positive

แทนที่จะแก้ rule ต้นฉบับโดยตรง ควรใช้ append เพื่อเพิ่มเงื่อนไข exception — ตัวอย่างการ whitelist process ที่รู้ว่าต้องอ่าน sensitive file เป็นปกติ:

- rule: Read sensitive file untrusted
  append: true
  condition: and not (k8s.ns.name = "kube-system"
                     and proc.name in (kubelet, kube-apiserver))
            and not (container.image.repository contains "hashicorp/vault")

Alert Output และ Falcosidekick

Falcosidekick รับ alert จาก engine แล้ว forward ไปยัง output มากกว่า 60 ปลายทาง เช่น Slack, PagerDuty, Elasticsearch, Loki, Prometheus (Alertmanager), AWS SNS, SIEM — การแยก detection กับ output logic ทำให้ปรับ routing และ filter ได้ง่ายโดยไม่ต้องแก้ core config

ตัวอย่าง Config ส่ง Alert ไปหลาย Output

# falcosidekick config
slack:
  webhookurl: https://hooks.slack.com/services/XXX
  minimumpriority: warning
  messageformat: all

pagerduty:
  routingkey: YOUR_ROUTING_KEY
  minimumpriority: critical

elasticsearch:
  hostport: http://elasticsearch:9200
  index: falco
  minimumpriority: debug

loki:
  hostport: http://loki:3100
  minimumpriority: debug

prometheus:
  extralabels: cluster:production,env:prod

webhook:
  address: https://response-automation.example.com/webhook
  customheaders: Authorization:Bearer YOUR_TOKEN
  minimumpriority: critical

Priority Level และ Routing

Priority	ระดับ	แนะนำให้ส่งไปที่
EMERGENCY	ระบบพัง	PagerDuty (page oncall ทันที)
CRITICAL	ภัยคุกคามรุนแรง	PagerDuty + Slack
ERROR	ต้องดูทันที	Slack + SIEM
WARNING	ต้องตรวจสอบ	Slack (channel รอง) + SIEM
NOTICE	เหตุการณ์ปกติที่น่าสนใจ	SIEM + Loki
INFO/DEBUG	log ทั่วไป	Loki / Elasticsearch

Response Automation

เมื่อ detect threat แล้วควรมี response automation เพื่อลด MTTR (Mean Time To Respond) — ตัวอย่างการตอบสนองอัตโนมัติที่ทำได้:

Pod Isolation — ใช้ NetworkPolicy block traffic ทั้งหมดจาก pod ที่ detect threat
Pod Termination — delete pod ที่มี critical alert (kube-apiserver จะสร้าง pod ใหม่จาก Deployment)
Snapshot Evidence — export process list, network connection, file changes ไปเก็บไว้สำหรับ forensic
Ticket Creation — สร้าง ticket อัตโนมัติใน Jira/PagerDuty พร้อม context ครบถ้วน

ตัวอย่าง Response Playbook ด้วย Falcosidekick + Argo Events

# Argo Events Sensor ที่ subscribe webhook จาก Falcosidekick
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
  name: falco-response
spec:
  dependencies:
  - name: falco-critical
    eventSourceName: falco-webhook
    eventName: critical-alert
    filters:
      data:
      - path: body.priority
        type: string
        value:
        - "Critical"
  triggers:
  - template:
      name: isolate-pod
      k8s:
        operation: create
        source:
          resource:
            apiVersion: networking.k8s.io/v1
            kind: NetworkPolicy
            metadata:
              name: isolate-{{.Input.body.output_fields.k8s_pod_name}}
              namespace: "{{.Input.body.output_fields.k8s_ns_name}}"
            spec:
              podSelector:
                matchLabels:
                  app: "{{.Input.body.output_fields.k8s_pod_name}}"
              policyTypes: [Ingress, Egress]

Best Practices

เริ่มจาก rule ในตัวก่อน — ใช้ default rule set เรียนรู้ว่า workload ปกติมีพฤติกรรมอะไรบ้าง ก่อนเขียน custom rule เพิ่มเติม
Tune เพื่อลด False Positive — ทุก environment ไม่เหมือนกัน ใช้ append rule แทนการแก้ rule ต้นฉบับ
ใช้ Priority Level ให้ถูก — อย่าส่งทุก alert เข้า PagerDuty เพราะจะเกิด alert fatigue ควรแยกตาม priority
เก็บ Alert ใน SIEM — ทุก alert ควรเก็บไว้ใน long-term storage เพื่อ forensic และ audit (compliance)
Review rule เป็นประจำ — threat landscape เปลี่ยนตลอดเวลา ต้อง update rule ให้ทัน (subscribe upstream rules repo)
Test rule บน staging — ทดสอบ rule ใหม่บน staging หรือ dry-run ก่อน apply บน production
Monitor ตัว agent เอง — agent pod ก็อาจล่ม หรือ drop event ได้ ควร monitor metric falco_events_dropped
รวมกับ Admission Controller — ใช้ OPA/Kyverno block image ที่ไม่ผ่าน policy ตั้งแต่ deploy ส่วน runtime engine จะจับเฉพาะพฤติกรรมหลัง deploy

Monitoring Performance ของระบบเอง

ตัว agent เอง expose metrics ออกมาทาง Prometheus endpoint เพื่อให้เรา monitor health ของมันได้ — metric ที่ควรจับตามอง:

Metric	ความหมาย
falco_events_total	จำนวน event ทั้งหมดที่ตรวจ
falco_events_rate_sec	event ต่อวินาที
falco_events_dropped	event ที่ drop (ระบบไม่ทัน)
falco_rules_matches_total	จำนวน rule match แยกตาม rule
falco_memory_rss_bytes	RAM ที่ใช้
falco_cpu_usage_ratio	CPU ที่ใช้

ถ้า falco_events_dropped มีค่า > 0 หมายความว่าระบบประมวลผลไม่ทัน ควรเพิ่ม resource หรือลด rule ที่ไม่จำเป็นออก เพื่อให้ detect ได้ครบทุก event

สรุป

Runtime Security เป็นชั้นป้องกันที่ขาดไม่ได้ในระบบ Cloud Native — เพราะ image scanning หรือ admission policy ทำได้แค่จับ known vulnerability และ misconfig ตอน deploy แต่ไม่สามารถจับพฤติกรรมผิดปกติที่เกิดขึ้นหลัง workload ทำงานแล้ว runtime engine แบบ syscall-based จึงเป็น tool สำคัญที่ตรวจจับ threat แบบ real-time

การติดตั้งบน Kubernetes ด้วย Helm พร้อม Falcosidekick ทำให้ได้ระบบ detection และ alerting แบบครบวงจรอย่างรวดเร็ว — แต่การใช้งานจริงควร tune rule ให้เข้ากับ workload ของระบบ ลด false positive และสร้าง response playbook เพื่อลด MTTR เมื่อเกิดเหตุการณ์จริง ๆ ร่วมกับการเก็บ alert ใน SIEM เพื่อการ audit และ compliance