프로젝트: 시스템 모니터링 도구(System Monitoring Tool)
프로젝트: 시스템 모니터링 도구(System Monitoring Tool)¶
난이도: ⭐⭐⭐⭐
1. 개요¶
우리가 만들 것¶
순수 bash로 작성된 포괄적인 터미널 기반 시스템 모니터링 대시보드입니다. 이 도구는 다음에 대한 실시간 가시성을 제공합니다:
- CPU 사용량과 부하 평균
- 메모리와 스왑 활용도
- 파일 시스템 전체의 디스크 공간
- 네트워크 I/O 통계
- 활성 프로세스와 리소스 소비
- 에러 탐지를 통한 시스템 로그
- 임계값 초과 시 경고 알림
htop, glances, Datadog 에이전트와 같은 도구의 경량 커스터마이징 가능한 대안으로 생각하되, bash로 완전히 자체 포함됩니다.
주요 기능¶
- 실시간 대시보드: 컬러와 레이아웃이 있는 실시간 업데이트 터미널 UI
- 설정 가능한 경고: Slack/Discord 웹훅으로 전송되는 임계값 기반 경고
- 로그 집계: 에러/경고에 대한 시스템 로그 파싱 및 요약
- Cron 통합: 주기적 확인 실행 및 보고서 생성
- 크로스 플랫폼: Linux와 macOS에서 작동 (OS 특화 적응 포함)
- 제로 의존성: 표준 유닉스 도구를 사용하는 순수 bash
왜 순수 Bash인가?¶
- 보편성: bash가 설치된 모든 서버에서 실행
- 투명성: 블랙박스 모니터링 에이전트 없음
- 커스터마이징: 메트릭, 임계값, 출력 형식을 쉽게 수정
- 학습: 고급 bash 기술 시연 (tput, /proc 파싱, 시그널 처리)
2. 설계¶
아키텍처¶
Monitor System
├── Data Collection
│ ├── CPU metrics (/proc/stat, top)
│ ├── Memory metrics (/proc/meminfo, free)
│ ├── Disk metrics (df)
│ ├── Network metrics (/proc/net/dev)
│ └── Process metrics (ps, top)
│
├── Display Engine
│ ├── Terminal UI (tput)
│ ├── Color formatting
│ ├── Layout management
│ └── Responsive design (SIGWINCH)
│
├── Alerting System
│ ├── Threshold configuration
│ ├── Alert cooldown (prevent spam)
│ ├── Webhook notifications (Slack, Discord)
│ └── Email alerts (mail command)
│
└── Reporting
├── Log parsing
├── Historical data collection
├── HTML report generation
└── CSV export
데이터 흐름¶
1. Collect metrics from system APIs (/proc, df, ps)
2. Parse and normalize data
3. Compare against thresholds
4. If threshold exceeded:
a. Check cooldown period
b. Send alert via webhook/email
c. Record alert event
5. Format data for terminal display
6. Render to screen using tput
7. Wait refresh interval
8. Repeat
설정¶
임계값과 설정은 설정 파일에 저장:
# monitor.conf
REFRESH_INTERVAL=2 # Seconds between updates
CPU_ALERT_THRESHOLD=80 # Percent
MEM_ALERT_THRESHOLD=85 # Percent
DISK_ALERT_THRESHOLD=90 # Percent
ALERT_COOLDOWN=300 # Seconds (5 minutes)
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/..."
3. tput을 사용한 터미널 UI¶
tput이란?¶
tput은 terminfo 데이터베이스를 사용하여 커서 위치, 색상, 화면 지우기를 조작하는 터미널 제어 유틸리티입니다.
필수 tput 명령어¶
# Cursor movement
tput cup Y X # Move cursor to row Y, column X (0-indexed)
tput cuu N # Move cursor up N lines
tput cud N # Move cursor down N lines
# Screen control
tput clear # Clear entire screen
tput el # Clear to end of line
tput ed # Clear to end of screen
# Colors
tput setaf N # Set foreground color (0-7)
tput setab N # Set background color (0-7)
tput bold # Enable bold
tput sgr0 # Reset all attributes
# Terminal info
tput cols # Get number of columns
tput lines # Get number of rows
색상 참조¶
# ANSI color codes (tput setaf)
0 = Black
1 = Red
2 = Green
3 = Yellow
4 = Blue
5 = Magenta
6 = Cyan
7 = White
대시보드 레이아웃 구축¶
#!/usr/bin/env bash
# Clear screen and hide cursor
tput clear
tput civis # Hide cursor
# Draw header
tput cup 0 0
tput bold; tput setaf 6 # Bold cyan
echo "╔════════════════════════════════════════════════════════════════╗"
tput cup 1 0
echo "║ SYSTEM MONITOR - $(hostname) ║"
tput cup 2 0
echo "╚════════════════════════════════════════════════════════════════╝"
tput sgr0
# Draw CPU section
tput cup 4 0
tput setaf 3; echo "CPU Usage:"
tput sgr0
tput cup 5 2
echo "Load Average: 1.23, 0.98, 0.76"
tput cup 6 2
echo "CPU: [████████░░] 78%"
# Draw Memory section
tput cup 8 0
tput setaf 3; echo "Memory:"
tput sgr0
tput cup 9 2
echo "Used: 12.4 GB / 16.0 GB (77%)"
# Show cursor again before exiting
tput cnorm
진행 막대¶
백분율의 시각적 표현:
draw_bar() {
local percent="$1"
local width="${2:-50}"
local filled=$(( percent * width / 100 ))
local empty=$(( width - filled ))
# Color based on threshold
if [ "$percent" -ge 90 ]; then
tput setaf 1 # Red
elif [ "$percent" -ge 70 ]; then
tput setaf 3 # Yellow
else
tput setaf 2 # Green
fi
# Draw bar
printf "["
printf "%${filled}s" | tr ' ' '█'
printf "%${empty}s" | tr ' ' '░'
printf "] %3d%%\n" "$percent"
tput sgr0
}
# Usage
draw_bar 78 40
터미널 크기 조정 처리 (SIGWINCH)¶
터미널 크기 변경 시 다시 그리기:
resize_handler() {
TERM_COLS=$(tput cols)
TERM_LINES=$(tput lines)
tput clear
# Redraw entire dashboard
}
trap resize_handler SIGWINCH
# Get initial terminal size
TERM_COLS=$(tput cols)
TERM_LINES=$(tput lines)
4. 시스템 메트릭 수집¶
CPU 사용량¶
/proc/stat에서 CPU 사용량 계산:
get_cpu_usage() {
# Read /proc/stat twice with a delay to calculate usage
local prev_idle prev_total idle total
# First reading
read -r cpu prev_idle prev_total <<< $(awk '/^cpu / {
idle = $5
total = 0
for (i=2; i<=NF; i++) total += $i
print idle, total
}' /proc/stat)
sleep 0.5
# Second reading
read -r cpu idle total <<< $(awk '/^cpu / {
idle = $5
total = 0
for (i=2; i<=NF; i++) total += $i
print idle, total
}' /proc/stat)
# Calculate percentage
local idle_delta=$((idle - prev_idle))
local total_delta=$((total - prev_total))
local usage=$((100 * (total_delta - idle_delta) / total_delta))
echo "$usage"
}
macOS 대안 (top 사용):
get_cpu_usage_macos() {
top -l 2 -n 0 -F -s 1 | \
awk '/^CPU usage:/ {print 100 - $7}' | \
tail -1 | \
cut -d. -f1
}
부하 평균¶
get_load_average() {
awk '{print $1, $2, $3}' /proc/loadavg
}
# macOS compatible version
get_load_average_cross() {
uptime | awk -F'load averages?: ' '{print $2}'
}
메모리 사용량¶
get_memory_usage() {
# Linux: /proc/meminfo
if [ -f /proc/meminfo ]; then
awk '/^MemTotal:/ {total=$2}
/^MemAvailable:/ {avail=$2}
END {
used = total - avail
percent = int(100 * used / total)
printf "%.1f,%.1f,%d\n", used/1024/1024, total/1024/1024, percent
}' /proc/meminfo
else
# macOS: use vm_stat
vm_stat | awk '
/Pages free/ {free=$3}
/Pages active/ {active=$3}
/Pages inactive/ {inactive=$3}
/Pages speculative/ {spec=$3}
/Pages wired/ {wired=$3}
END {
# Page size is typically 4096 bytes
total_gb = (free + active + inactive + spec + wired) * 4096 / 1024 / 1024 / 1024
used_gb = (active + wired) * 4096 / 1024 / 1024 / 1024
percent = int(100 * used_gb / total_gb)
printf "%.1f,%.1f,%d\n", used_gb, total_gb, percent
}'
fi
}
디스크 사용량¶
get_disk_usage() {
df -h / | awk 'NR==2 {
gsub(/%/, "", $5)
print $3, $2, $5
}'
}
# All mounted filesystems
get_all_disks() {
df -h | awk 'NR>1 && $1 ~ /^\// {
gsub(/%/, "", $5)
printf "%-20s %8s / %8s (%3d%%)\n", $6, $3, $2, $5
}'
}
네트워크 I/O¶
get_network_io() {
local interface="${1:-eth0}"
if [ -f /proc/net/dev ]; then
# Linux
awk -v iface="$interface" '
$1 == iface":" {
printf "RX: %d bytes, TX: %d bytes\n", $2, $10
}' /proc/net/dev
else
# macOS: use netstat
netstat -ib | awk -v iface="$interface" '
$1 == iface && $3 ~ /Link/ {
printf "RX: %d bytes, TX: %d bytes\n", $7, $10
}'
fi
}
프로세스 메트릭¶
get_top_processes() {
local count="${1:-5}"
# Linux
ps aux --sort=-%mem | head -n $((count + 1)) | awk 'NR>1 {
printf "%-10s %5.1f%% %5.1f%% %s\n", $1, $3, $4, $11
}'
}
# Top CPU consumers
get_top_cpu_processes() {
ps aux --sort=-%cpu | head -6 | awk 'NR>1 {
printf "%-15s %6.1f%% %s\n", $1, $3, $11
}'
}
5. 경고 시스템¶
임계값 설정¶
# Default thresholds
CPU_ALERT_THRESHOLD="${CPU_ALERT_THRESHOLD:-80}"
MEM_ALERT_THRESHOLD="${MEM_ALERT_THRESHOLD:-85}"
DISK_ALERT_THRESHOLD="${DISK_ALERT_THRESHOLD:-90}"
LOAD_ALERT_THRESHOLD="${LOAD_ALERT_THRESHOLD:-4.0}"
# Alert cooldown (don't spam)
ALERT_COOLDOWN="${ALERT_COOLDOWN:-300}" # 5 minutes
ALERT_HISTORY="/tmp/monitor_alerts.log"
경고 확인 함수¶
check_cpu_threshold() {
local cpu_usage="$1"
if [ "$cpu_usage" -ge "$CPU_ALERT_THRESHOLD" ]; then
if should_send_alert "cpu"; then
send_alert "CPU" "CPU usage is ${cpu_usage}% (threshold: ${CPU_ALERT_THRESHOLD}%)"
record_alert "cpu"
fi
fi
}
check_memory_threshold() {
local mem_percent="$1"
if [ "$mem_percent" -ge "$MEM_ALERT_THRESHOLD" ]; then
if should_send_alert "memory"; then
send_alert "MEMORY" "Memory usage is ${mem_percent}% (threshold: ${MEM_ALERT_THRESHOLD}%)"
record_alert "memory"
fi
fi
}
check_disk_threshold() {
local disk_percent="$1"
if [ "$disk_percent" -ge "$DISK_ALERT_THRESHOLD" ]; then
if should_send_alert "disk"; then
send_alert "DISK" "Disk usage is ${disk_percent}% (threshold: ${DISK_ALERT_THRESHOLD}%)"
record_alert "disk"
fi
fi
}
경고 쿨다운¶
쿨다운 기간을 사용하여 경고 스팸 방지:
should_send_alert() {
local alert_type="$1"
local now=$(date +%s)
local last_alert_time=0
# Check last alert time
if [ -f "$ALERT_HISTORY" ]; then
last_alert_time=$(awk -v type="$alert_type" \
'$1 == type {print $2}' "$ALERT_HISTORY" 2>/dev/null || echo 0)
fi
local time_since_alert=$((now - last_alert_time))
if [ "$time_since_alert" -ge "$ALERT_COOLDOWN" ]; then
return 0 # OK to send
else
return 1 # Too soon
fi
}
record_alert() {
local alert_type="$1"
local now=$(date +%s)
# Update or append alert record
if [ -f "$ALERT_HISTORY" ]; then
# Update existing entry or add new one
grep -v "^${alert_type} " "$ALERT_HISTORY" > "$ALERT_HISTORY.tmp" 2>/dev/null || true
echo "${alert_type} ${now}" >> "$ALERT_HISTORY.tmp"
mv "$ALERT_HISTORY.tmp" "$ALERT_HISTORY"
else
echo "${alert_type} ${now}" > "$ALERT_HISTORY"
fi
}
웹훅 알림 (Slack/Discord)¶
send_slack_alert() {
local title="$1"
local message="$2"
local webhook_url="${SLACK_WEBHOOK_URL}"
if [ -z "$webhook_url" ]; then
return 0
fi
local payload=$(cat <<EOF
{
"username": "System Monitor",
"icon_emoji": ":warning:",
"attachments": [{
"color": "danger",
"title": "${title}",
"text": "${message}",
"footer": "$(hostname)",
"ts": $(date +%s)
}]
}
EOF
)
curl -X POST -H 'Content-type: application/json' \
--data "$payload" \
"$webhook_url" &>/dev/null
}
send_discord_alert() {
local title="$1"
local message="$2"
local webhook_url="${DISCORD_WEBHOOK_URL}"
if [ -z "$webhook_url" ]; then
return 0
fi
local payload=$(cat <<EOF
{
"username": "System Monitor",
"embeds": [{
"title": "${title}",
"description": "${message}",
"color": 15158332,
"footer": {
"text": "$(hostname)"
},
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
}]
}
EOF
)
curl -X POST -H 'Content-type: application/json' \
--data "$payload" \
"$webhook_url" &>/dev/null
}
send_alert() {
local title="$1"
local message="$2"
# Try Slack
if [ -n "${SLACK_WEBHOOK_URL:-}" ]; then
send_slack_alert "$title" "$message"
fi
# Try Discord
if [ -n "${DISCORD_WEBHOOK_URL:-}" ]; then
send_discord_alert "$title" "$message"
fi
# Fallback to email
if command -v mail &>/dev/null && [ -n "${ALERT_EMAIL:-}" ]; then
echo "$message" | mail -s "[ALERT] $title" "$ALERT_EMAIL"
fi
# Log locally
echo "[$(date)] $title: $message" >> /var/log/monitor_alerts.log
}
6. 로그 집계¶
시스템 로그 파싱¶
parse_syslog_errors() {
local log_file="/var/log/syslog"
local time_range="${1:-1h}" # Last 1 hour
# Convert time range to minutes
local minutes=60
case "$time_range" in
*h) minutes=$((${time_range%h} * 60)) ;;
*m) minutes=${time_range%m} ;;
esac
# Get timestamp for filtering
local since=$(date -d "$minutes minutes ago" '+%b %d %H:%M' 2>/dev/null || \
date -v-${minutes}M '+%b %d %H:%M')
if [ -f "$log_file" ]; then
awk -v since="$since" '
$0 ~ since {found=1}
found && /error|fail|critical/i {print}
' "$log_file"
fi
}
# Count error types
count_log_errors() {
local log_file="${1:-/var/log/syslog}"
if [ ! -f "$log_file" ]; then
echo "Log file not found: $log_file"
return 1
fi
awk '
/ERROR/ {error++}
/WARNING/ {warning++}
/CRITICAL/ {critical++}
/FATAL/ {fatal++}
END {
printf "CRITICAL: %d, FATAL: %d, ERROR: %d, WARNING: %d\n",
critical, fatal, error, warning
}
' "$log_file"
}
애플리케이션 로그 파싱¶
parse_app_logs() {
local app_log="/var/log/myapp/app.log"
local lines="${1:-100}"
if [ ! -f "$app_log" ]; then
return 0
fi
# Get recent errors
tail -n "$lines" "$app_log" | \
grep -E "ERROR|Exception|Traceback" | \
tail -10
}
# Summarize log patterns
summarize_logs() {
local log_file="$1"
echo "=== Log Summary ==="
echo ""
# Error counts by hour
echo "Errors per hour (last 24h):"
awk '/ERROR/ {
match($0, /([0-9]{2}):/, hour)
if (hour[1]) hours[hour[1]]++
}
END {
for (h in hours) printf " %s:00 - %d errors\n", h, hours[h]
}' "$log_file" | sort
echo ""
# Top error messages
echo "Top 5 error patterns:"
grep "ERROR" "$log_file" | \
awk -F'ERROR' '{print $2}' | \
sort | uniq -c | sort -rn | head -5
}
7. 완전한 모니터 스크립트¶
완전한 모니터링 대시보드 구현입니다:
#!/usr/bin/env bash
set -euo pipefail
# ============================================================================
# System Monitor - Real-time Dashboard
# ============================================================================
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_FILE="${SCRIPT_DIR}/monitor.conf"
# Defaults
REFRESH_INTERVAL="${REFRESH_INTERVAL:-2}"
CPU_ALERT_THRESHOLD="${CPU_ALERT_THRESHOLD:-80}"
MEM_ALERT_THRESHOLD="${MEM_ALERT_THRESHOLD:-85}"
DISK_ALERT_THRESHOLD="${DISK_ALERT_THRESHOLD:-90}"
ALERT_COOLDOWN="${ALERT_COOLDOWN:-300}"
# Load config if exists
if [ -f "$CONFIG_FILE" ]; then
source "$CONFIG_FILE"
fi
# ============================================================================
# Terminal Setup
# ============================================================================
setup_terminal() {
tput clear
tput civis # Hide cursor
trap cleanup EXIT
trap resize_handler SIGWINCH
}
cleanup() {
tput clear
tput cnorm # Show cursor
tput sgr0 # Reset colors
}
resize_handler() {
TERM_COLS=$(tput cols)
TERM_LINES=$(tput lines)
tput clear
}
# ============================================================================
# Metrics Collection
# ============================================================================
get_cpu_usage() {
if [ -f /proc/stat ]; then
local prev_idle prev_total idle total
read -r cpu prev_idle prev_total <<< $(awk '/^cpu / {
idle = $5; total = 0
for (i=2; i<=NF; i++) total += $i
print idle, total
}' /proc/stat)
sleep 0.2
read -r cpu idle total <<< $(awk '/^cpu / {
idle = $5; total = 0
for (i=2; i<=NF; i++) total += $i
print idle, total
}' /proc/stat)
local idle_delta=$((idle - prev_idle))
local total_delta=$((total - prev_total))
echo $((100 * (total_delta - idle_delta) / total_delta))
else
# macOS fallback
echo 50
fi
}
get_memory_info() {
if [ -f /proc/meminfo ]; then
awk '/^MemTotal:/ {total=$2}
/^MemAvailable:/ {avail=$2}
END {
used = total - avail
percent = int(100 * used / total)
printf "%.1f %.1f %d\n", used/1024/1024, total/1024/1024, percent
}' /proc/meminfo
else
echo "8.0 16.0 50"
fi
}
get_disk_info() {
df -h / | awk 'NR==2 {
gsub(/%/, "", $5)
print $3, $2, $5
}'
}
get_load_avg() {
if [ -f /proc/loadavg ]; then
awk '{print $1, $2, $3}' /proc/loadavg
else
uptime | awk -F'load averages?: ' '{print $2}' | awk '{print $1, $2, $3}' | tr -d ','
fi
}
get_network_stats() {
local interface="${1:-eth0}"
if [ -f /proc/net/dev ]; then
awk -v iface="$interface" '
$1 == iface":" {
rx = $2 / 1024 / 1024
tx = $10 / 1024 / 1024
printf "%.2f %.2f\n", rx, tx
}' /proc/net/dev
else
echo "0.00 0.00"
fi
}
# ============================================================================
# Display Functions
# ============================================================================
draw_header() {
tput cup 0 0
tput bold; tput setaf 6
local width=$((TERM_COLS - 1))
printf "╔%${width}s╗\n" | tr ' ' '═'
printf "║ %-$((width - 2))s ║\n" "SYSTEM MONITOR - $(hostname) - $(date +'%Y-%m-%d %H:%M:%S')"
printf "╚%${width}s╝\n" | tr ' ' '═'
tput sgr0
}
draw_bar() {
local percent="$1"
local width="${2:-40}"
local filled=$(( percent * width / 100 ))
local empty=$(( width - filled ))
if [ "$percent" -ge 90 ]; then
tput setaf 1
elif [ "$percent" -ge 70 ]; then
tput setaf 3
else
tput setaf 2
fi
printf "["
printf "%${filled}s" | tr ' ' '█'
printf "%${empty}s" | tr ' ' '░'
printf "] %3d%%\n" "$percent"
tput sgr0
}
draw_metrics() {
local row=4
# CPU
tput cup $row 2
tput bold; tput setaf 3
echo "CPU Usage:"
tput sgr0
row=$((row + 1))
local cpu_usage=$(get_cpu_usage)
tput cup $row 4
draw_bar "$cpu_usage" 50
row=$((row + 1))
read -r load1 load5 load15 <<< $(get_load_avg)
tput cup $row 4
printf "Load Average: %.2f, %.2f, %.2f\n" "$load1" "$load5" "$load15"
# Memory
row=$((row + 2))
tput cup $row 2
tput bold; tput setaf 3
echo "Memory:"
tput sgr0
row=$((row + 1))
read -r mem_used mem_total mem_percent <<< $(get_memory_info)
tput cup $row 4
printf "Used: %.1f GB / %.1f GB\n" "$mem_used" "$mem_total"
row=$((row + 1))
tput cup $row 4
draw_bar "$mem_percent" 50
# Disk
row=$((row + 2))
tput cup $row 2
tput bold; tput setaf 3
echo "Disk (/):"
tput sgr0
row=$((row + 1))
read -r disk_used disk_total disk_percent <<< $(get_disk_info)
tput cup $row 4
printf "Used: %s / %s\n" "$disk_used" "$disk_total"
row=$((row + 1))
tput cup $row 4
draw_bar "$disk_percent" 50
# Network
row=$((row + 2))
tput cup $row 2
tput bold; tput setaf 3
echo "Network (eth0):"
tput sgr0
row=$((row + 1))
read -r rx_mb tx_mb <<< $(get_network_stats "eth0")
tput cup $row 4
printf "RX: %.2f MB | TX: %.2f MB\n" "$rx_mb" "$tx_mb"
# Top Processes
row=$((row + 2))
tput cup $row 2
tput bold; tput setaf 3
echo "Top CPU Processes:"
tput sgr0
row=$((row + 1))
tput cup $row 4
ps aux --sort=-%cpu 2>/dev/null | head -6 | tail -5 | \
awk '{printf "%-12s %5.1f%% %s\n", $1, $3, $11}' | \
while IFS= read -r line; do
tput cup $row 4
echo "$line"
row=$((row + 1))
done
# Footer
tput cup $((TERM_LINES - 2)) 2
tput setaf 8
echo "Press Ctrl+C to exit | Refresh: ${REFRESH_INTERVAL}s"
tput sgr0
}
# ============================================================================
# Alert Functions
# ============================================================================
check_thresholds() {
local cpu_usage="$1"
local mem_percent="$2"
local disk_percent="$3"
[ "$cpu_usage" -ge "$CPU_ALERT_THRESHOLD" ] && \
send_alert "High CPU Usage" "CPU at ${cpu_usage}% (threshold: ${CPU_ALERT_THRESHOLD}%)"
[ "$mem_percent" -ge "$MEM_ALERT_THRESHOLD" ] && \
send_alert "High Memory Usage" "Memory at ${mem_percent}% (threshold: ${MEM_ALERT_THRESHOLD}%)"
[ "$disk_percent" -ge "$DISK_ALERT_THRESHOLD" ] && \
send_alert "Low Disk Space" "Disk at ${disk_percent}% (threshold: ${DISK_ALERT_THRESHOLD}%)"
}
send_alert() {
local title="$1"
local message="$2"
if [ -n "${SLACK_WEBHOOK_URL:-}" ]; then
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[$(hostname)] ${title}: ${message}\"}" \
"$SLACK_WEBHOOK_URL" &>/dev/null &
fi
echo "[$(date)] ALERT: $title - $message" >> /var/log/monitor.log
}
# ============================================================================
# Main Loop
# ============================================================================
main() {
setup_terminal
TERM_COLS=$(tput cols)
TERM_LINES=$(tput lines)
while true; do
draw_header
draw_metrics
# Collect current values for alerting
cpu_usage=$(get_cpu_usage)
read -r mem_used mem_total mem_percent <<< $(get_memory_info)
read -r disk_used disk_total disk_percent <<< $(get_disk_info)
# Check thresholds (in background to avoid blocking)
check_thresholds "$cpu_usage" "$mem_percent" "$disk_percent" &
sleep "$REFRESH_INTERVAL"
done
}
main "$@"
8. Cron 통합¶
주기적 모니터링¶
5분마다 헬스 체크 실행:
# Crontab entry
*/5 * * * * /opt/monitor/monitor.sh --check-only >> /var/log/monitor_cron.log 2>&1
--check-only 모드가 있는 모니터 스크립트:
if [ "${1:-}" = "--check-only" ]; then
# Collect metrics
cpu=$(get_cpu_usage)
read mem_used mem_total mem_percent <<< $(get_memory_info)
read disk_used disk_total disk_percent <<< $(get_disk_info)
# Check thresholds
check_thresholds "$cpu" "$mem_percent" "$disk_percent"
# Log
echo "[$(date)] CPU: ${cpu}% | MEM: ${mem_percent}% | DISK: ${disk_percent}%"
exit 0
fi
보고서 생성¶
일일 HTML 보고서:
generate_html_report() {
local output="/var/www/html/monitor_report.html"
cat > "$output" <<EOF
<!DOCTYPE html>
<html>
<head>
<title>System Monitor Report</title>
<style>
body { font-family: monospace; margin: 20px; }
.metric { margin: 10px 0; }
.bar { background: #e0e0e0; height: 20px; width: 400px; position: relative; }
.bar-fill { background: #4CAF50; height: 100%; }
.bar-fill.warning { background: #FFC107; }
.bar-fill.danger { background: #F44336; }
</style>
</head>
<body>
<h1>System Monitor Report - $(hostname)</h1>
<p>Generated: $(date)</p>
<div class="metric">
<strong>CPU Usage:</strong> ${cpu_usage}%
<div class="bar">
<div class="bar-fill" style="width: ${cpu_usage}%"></div>
</div>
</div>
<div class="metric">
<strong>Memory:</strong> ${mem_used} GB / ${mem_total} GB (${mem_percent}%)
<div class="bar">
<div class="bar-fill" style="width: ${mem_percent}%"></div>
</div>
</div>
<div class="metric">
<strong>Disk:</strong> ${disk_used} / ${disk_total} (${disk_percent}%)
<div class="bar">
<div class="bar-fill" style="width: ${disk_percent}%"></div>
</div>
</div>
</body>
</html>
EOF
echo "Report generated: $output"
}
기록 데이터를 위한 CSV 내보내기¶
log_metrics_csv() {
local csv_file="/var/log/monitor_metrics.csv"
# Create header if file doesn't exist
if [ ! -f "$csv_file" ]; then
echo "timestamp,cpu_percent,mem_percent,disk_percent,load_1min" > "$csv_file"
fi
# Collect metrics
local cpu=$(get_cpu_usage)
read -r mem_used mem_total mem_percent <<< $(get_memory_info)
read -r disk_used disk_total disk_percent <<< $(get_disk_info)
read -r load1 load5 load15 <<< $(get_load_avg)
# Append to CSV
echo "$(date +%s),$cpu,$mem_percent,$disk_percent,$load1" >> "$csv_file"
}
9. 사용 예시¶
대시보드 실행¶
chmod +x monitor.sh
./monitor.sh
경고 설정¶
monitor.conf 편집:
# monitor.conf
REFRESH_INTERVAL=2
CPU_ALERT_THRESHOLD=80
MEM_ALERT_THRESHOLD=85
DISK_ALERT_THRESHOLD=90
ALERT_COOLDOWN=300
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
ALERT_EMAIL="ops@example.com"
주기적 확인을 위한 Cron 설정¶
# Install cron job
(crontab -l 2>/dev/null; echo "*/5 * * * * /opt/monitor/monitor.sh --check-only") | crontab -
# Daily HTML report at 8 AM
(crontab -l 2>/dev/null; echo "0 8 * * * /opt/monitor/generate_report.sh") | crontab -
10. 확장¶
1. 원격 모니터링¶
SSH를 통해 여러 서버 모니터링:
monitor_remote() {
local hosts=("web01" "web02" "db01")
for host in "${hosts[@]}"; do
echo "=== ${host} ==="
ssh "$host" 'bash -s' < monitor.sh --check-only
echo ""
done
}
2. 웹 대시보드¶
HTTP를 통해 실시간 메트릭 제공:
# Simple HTTP server that serves JSON metrics
while true; do
echo -e "HTTP/1.1 200 OK\r\nContent-Type: application/json\r\n\r\n{
\"cpu\": $(get_cpu_usage),
\"memory\": $(get_memory_info | awk '{print $3}'),
\"disk\": $(get_disk_info | awk '{print $3}')
}" | nc -l -p 8080 -q 1
done
3. 데이터베이스 저장¶
기록 분석을 위해 SQLite에 메트릭 저장:
store_metrics() {
local db="/var/lib/monitor/metrics.db"
sqlite3 "$db" <<EOF
CREATE TABLE IF NOT EXISTS metrics (
timestamp INTEGER,
cpu_percent INTEGER,
mem_percent INTEGER,
disk_percent INTEGER
);
INSERT INTO metrics VALUES (
$(date +%s),
$(get_cpu_usage),
$(get_memory_info | awk '{print $3}'),
$(get_disk_info | awk '{print $3}')
);
EOF
}
4. Grafana 호환 메트릭¶
Prometheus 형식으로 메트릭 내보내기:
# Expose metrics at http://localhost:9090/metrics
serve_prometheus_metrics() {
while true; do
cat <<EOF | nc -l -p 9090 -q 1
HTTP/1.1 200 OK
Content-Type: text/plain
# HELP system_cpu_usage CPU usage percentage
# TYPE system_cpu_usage gauge
system_cpu_usage $(get_cpu_usage)
# HELP system_memory_usage Memory usage percentage
# TYPE system_memory_usage gauge
system_memory_usage $(get_memory_info | awk '{print $3}')
# HELP system_disk_usage Disk usage percentage
# TYPE system_disk_usage gauge
system_disk_usage $(get_disk_info | awk '{print $3}')
EOF
done
}