Prometheus Exporters: เก็บ Metrics จาก Applications และ Services

Prometheus Exporter คือตัวกลางที่แปลง metrics จากแอปพลิเคชันหรือระบบที่ไม่ได้รองรับ Prometheus โดยตรงให้กลายเป็นรูปแบบ text-based ที่ Prometheus อ่านได้ ระบบ exporter เป็นจุดสำคัญของ ecosystem เพราะทำให้ Prometheus สามารถ monitor อะไรก็ได้ ตั้งแต่ MySQL, Redis, Nginx ไปจนถึง router และ IoT device

บทความนี้จะอธิบายสถาปัตยกรรมของ exporter, รายชื่อ exporter ที่นิยมใช้, ขั้นตอนการติดตั้ง Node Exporter และ MySQL Exporter, รวมถึงแนวทางการเขียน custom exporter ด้วย Python และ Go สำหรับระบบเฉพาะทาง

Exporter ทำงานอย่างไร

Exporter ทำหน้าที่เป็น HTTP server เล็ก ๆ ที่เปิด endpoint /metrics ไว้ เมื่อ Prometheus scrape ไปที่ endpoint นี้ exporter จะเก็บข้อมูลจากระบบต้นทาง (เช่น ยิง query ไปที่ database, อ่านไฟล์ /proc, เรียก API ของ hardware) แล้วแปลงเป็น Prometheus text format ส่งกลับมาเป็น HTTP response

หลักการสำคัญ: exporter ไม่เก็บข้อมูลระยะยาวเอง ทุกครั้งที่ถูก scrape จะไปดึงข้อมูลสด ๆ จากต้นทาง — ถ้าต้นทางล่ม exporter ก็ return error หรือ metric ที่มีค่า up = 0 ซึ่ง Prometheus สามารถ alert ได้ทันที

Exporter ยอดนิยม

Exporter	ใช้กับ	Port พื้นฐาน
node_exporter	Linux/Unix system metrics	9100
mysqld_exporter	MySQL/MariaDB	9104
postgres_exporter	PostgreSQL	9187
redis_exporter	Redis	9121
nginx-prometheus-exporter	Nginx stub_status	9113
blackbox_exporter	HTTP/ICMP/TCP probing	9115
cadvisor	Container metrics	8080
windows_exporter	Windows system metrics	9182
snmp_exporter	SNMP devices (router, switch)	9116

Prometheus Text Format

Exporter ทุกตัวต้อง expose metric ในรูปแบบ text format ที่ Prometheus เข้าใจ โครงสร้างมาตรฐานมีดังนี้:

# HELP http_requests_total Total HTTP requests received
# TYPE http_requests_total counter
http_requests_total{method="GET",status="200"} 1523
http_requests_total{method="POST",status="201"} 89
http_requests_total{method="GET",status="500"} 3

# HELP process_cpu_seconds_total CPU time consumed
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 42.17

# HELP memory_usage_bytes Current memory usage
# TYPE memory_usage_bytes gauge
memory_usage_bytes 524288000

แต่ละ metric มี # HELP อธิบาย metric, # TYPE ระบุประเภท (counter, gauge, histogram, summary) ตามด้วยชื่อ metric, labels และ value — ทั้งหมดเป็น plain text อ่านด้วยตาได้เลย ไม่ต้องมี parser พิเศษ

ติดตั้ง MySQL Exporter

ตัวอย่างการติดตั้ง mysqld_exporter ที่ใช้กันบ่อยในระบบ production:

สร้าง Monitoring User

CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'StrongPassword123' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;

สิทธิ์ขั้นต่ำที่ exporter ต้องใช้คือ PROCESS, REPLICATION CLIENT, SELECT — ห้ามให้สิทธิ์สูงกว่านี้เพื่อลด attack surface

รัน Exporter ด้วย Docker

docker run -d \
  --name mysqld-exporter \
  --restart unless-stopped \
  -p 9104:9104 \
  -e DATA_SOURCE_NAME="exporter:StrongPassword123@(mysql-host:3306)/" \
  prom/mysqld-exporter:v0.15.1 \
  --collect.info_schema.tables \
  --collect.info_schema.innodb_metrics \
  --collect.slave_status

Flag --collect.* เปิดกลุ่ม metric แต่ละประเภท — เปิดเฉพาะที่จำเป็น เพราะบาง collector โหลดหนักถ้าเปิดทุกตัว

เพิ่มใน Prometheus Scrape Config

scrape_configs:
  - job_name: 'mysql'
    scrape_interval: 30s
    static_configs:
      - targets: ['mysql-exporter:9104']
        labels:
          service: 'mysql'
          environment: 'production'

หลังรัน Prometheus reload แล้ว ลองเข้า Prometheus UI ที่ tab Status → Targets ต้องเห็น job mysql state UP ถ้าเห็น DOWN แสดงว่า exporter มีปัญหาเชื่อมต่อ MySQL

Application Metrics ภายในโค้ด

นอกจาก exporter สำเร็จรูป แอปพลิเคชันที่พัฒนาเองควร instrument โค้ดให้ expose metric ตรง ๆ — ไม่ต้องผ่าน exporter กลาง ช่วยให้ได้ business metric ที่ตรงกับธุรกิจจริง

Python (prometheus_client)

from prometheus_client import Counter, Histogram, start_http_server
import time
import random

REQUEST_COUNT = Counter(
    'app_requests_total',
    'Total requests received',
    ['endpoint', 'status']
)

REQUEST_LATENCY = Histogram(
    'app_request_duration_seconds',
    'Request latency in seconds',
    ['endpoint']
)

def handle_request(endpoint):
    start = time.time()
    status = random.choice(['200', '200', '200', '500'])
    time.sleep(random.uniform(0.01, 0.5))
    REQUEST_COUNT.labels(endpoint=endpoint, status=status).inc()
    REQUEST_LATENCY.labels(endpoint=endpoint).observe(time.time() - start)

if __name__ == '__main__':
    start_http_server(8000)
    while True:
        handle_request('/api/users')
        time.sleep(1)

start_http_server(8000) เปิด endpoint /metrics บน port 8000 อัตโนมัติ — Prometheus scrape ไปที่ URL นี้ก็เห็น metric ครบทุก counter และ histogram ที่ประกาศไว้

Node.js (prom-client)

const express = require('express');
const client = require('prom-client');

const register = new client.Registry();
client.collectDefaultMetrics({ register });

const httpRequestDuration = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.1, 0.3, 0.5, 1, 2, 5],
});
register.registerMetric(httpRequestDuration);

const app = express();
app.use((req, res, next) => {
  const end = httpRequestDuration.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.route?.path || req.path, status: res.statusCode });
  });
  next();
});

app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.listen(3000);

collectDefaultMetrics จะเพิ่ม metric พื้นฐานของ Node.js runtime ให้อัตโนมัติ เช่น event loop lag, heap size, garbage collection — เป็น baseline ที่ควรมีทุก service

การเขียน Custom Exporter

เมื่อระบบที่ต้อง monitor ไม่มี exporter สำเร็จรูป การเขียน custom exporter เป็นเรื่องจำเป็น — แนวทางคือเขียน HTTP server ที่ implement endpoint /metrics และ return text ในรูปแบบ Prometheus format

Python Custom Exporter

from prometheus_client import start_http_server, Gauge
import time
import subprocess

queue_length = Gauge('app_queue_length', 'Current queue length', ['queue_name'])
job_wait_time = Gauge('app_job_wait_seconds', 'Oldest job wait time', ['queue_name'])

def collect():
    for queue in ['high', 'normal', 'low']:
        result = subprocess.run(
            ['redis-cli', 'LLEN', f'jobs:{queue}'],
            capture_output=True, text=True
        )
        queue_length.labels(queue_name=queue).set(int(result.stdout.strip() or 0))

if __name__ == '__main__':
    start_http_server(9200)
    while True:
        collect()
        time.sleep(15)

Pattern นี้ใช้ได้กับทุกสถานการณ์ที่ต้องการเก็บ metric จาก command line tool หรือ internal API — เช่น job queue, license server, legacy system

Pull vs Push

Prometheus โดย default ใช้ pull model — scrape เข้าไปหา exporter เป็นระยะ แต่บางกรณีเช่น batch job ที่รันแค่ครั้งเดียวแล้วหยุด ไม่มีเวลาให้ Prometheus scrape ควรใช้ Pushgateway แทน — job push metric ไปที่ Pushgateway แล้ว Prometheus scrape จาก Pushgateway อีกทอด

Metric Types ใน Exporter

Counter: ค่าเพิ่มอย่างเดียว เช่น requests_total, errors_total — reset เมื่อ service restart
Gauge: ค่าขึ้นลงได้ เช่น memory_usage_bytes, queue_length
Histogram: distribution ของ observation เช่น request latency buckets
Summary: คล้าย histogram แต่คำนวณ quantile ที่ exporter (ไม่ยืดหยุ่นเท่า histogram ใน query)

หลักเลือก: ใช้ histogram เกือบทุกครั้งที่วัด duration/size เพราะสามารถเปลี่ยน quantile คำนวณใน PromQL ได้ เช่น histogram_quantile(0.95, ...) — ใช้ summary เมื่อจำเป็นจริง ๆ เท่านั้น

Best Practices

ใช้ชื่อ metric ตาม convention: <subsystem>_<measurement>_<unit> เช่น http_request_duration_seconds
ไม่ใส่ high-cardinality label เช่น user_id, session_id — จำนวน label combination มากเกินไปทำให้ Prometheus ใช้ RAM สูงลิ่ว
Counter ต้องมี suffix _total เสมอ ตาม best practice ของ Prometheus
Histogram ต้องมี bucket ที่เหมาะสมกับ workload — default bucket อาจไม่ตรงกับการใช้งานจริง
เปิด up metric ให้ Prometheus ตรวจว่า exporter ทำงานอยู่ — up == 0 คือ alert critical
อย่า expose metric ที่เป็น secret เช่น token, credentials — expose เฉพาะ operational data
ใช้ authentication บน /metrics endpoint ถ้า expose ผ่าน public network
Benchmark cardinality: rate(prometheus_target_interval_length_seconds_count[5m]) ถ้า series นับล้านแล้ว ต้อง review label usage

Troubleshooting Exporter

อาการ	สาเหตุ	วิธีแก้
Target state DOWN	Exporter crash / network block	`curl http://exporter:port/metrics` จากภายใน Prometheus container
Scrape timeout	Exporter ช้า collect ข้อมูล	เพิ่ม `scrape_timeout` หรือลด collector
Metric ไม่ครบ	Collector ไม่ได้เปิด	ตรวจ flag `--collect.*`
Series สูงผิดปกติ	High cardinality label	ลบ label ที่ไม่จำเป็น relabel ก่อน scrape
RAM ของ Prometheus สูง	Scrape exporter ถี่เกิน	ยืด `scrape_interval` เป็น 30s-60s

สรุป

Exporter คือหัวใจของ Prometheus ecosystem — ทำให้ระบบที่ไม่เข้าใจ Prometheus สามารถถูก monitor ได้ ตั้งแต่ MySQL, Redis ไปจนถึง custom application ที่ต้องการ metric เฉพาะทาง การเลือก exporter ที่เหมาะสม และการ instrument โค้ดให้ expose metric อย่างถูกต้อง เป็นพื้นฐานของระบบ monitoring ที่ครอบคลุม

เมื่อรู้จัก exporter แล้ว ขั้นตอนถัดไปคือการเขียน alert rule ที่ใช้ metric เหล่านี้ และสร้าง dashboard ที่ visualize ได้อย่างมีประสิทธิภาพ