14. Linux Performance Tuning¶

Learning Objectives¶

System performance monitoring and analysis
Kernel parameter optimization via sysctl
CPU, memory, and I/O performance tuning
Profiling with perf and flamegraphs

Table of Contents¶

Performance Analysis Fundamentals
CPU Tuning
Memory Tuning
I/O Tuning
Network Tuning
Profiling Tools
Practice Exercises

1. Performance Analysis Fundamentals¶

1.1 USE Methodology¶

┌─────────────────────────────────────────────────────────────┐
│                USE Methodology (Brendan Gregg)               │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Check for each resource:                                   │
│                                                             │
│  U - Utilization                                            │
│      How much is the resource being used?                   │
│      Example: CPU at 80% usage                              │
│                                                             │
│  S - Saturation                                             │
│      Are tasks waiting?                                     │
│      Example: 10 processes in run queue                     │
│                                                             │
│  E - Errors                                                 │
│      Are errors occurring?                                  │
│      Example: Network packet drops                          │
│                                                             │
│  Key resources:                                             │
│  • CPU: mpstat, vmstat, top                                │
│  • Memory: free, vmstat, /proc/meminfo                     │
│  • Disk I/O: iostat, iotop                                 │
│  • Network: netstat, ss, sar                               │
│                                                             │
└─────────────────────────────────────────────────────────────┘

1.2 Basic Monitoring Tools¶

# top - real-time process monitoring
top
# Shortcuts: 1=per CPU, M=sort by memory, P=sort by CPU, k=kill

# htop - enhanced top
htop

# vmstat - virtual memory statistics
vmstat 1 5  # 1 second interval, 5 times
#  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
#  2  0      0 1234567 12345 234567    0    0     1     2  100  200  5  2 93  0  0
# r: processes waiting to run
# b: processes waiting for I/O
# si/so: swap in/out
# bi/bo: block in/out
# us/sy/id/wa: user/system/idle/wait

# mpstat - CPU statistics
mpstat -P ALL 1  # All CPUs, 1 second interval

# iostat - I/O statistics
iostat -x 1      # Extended info, 1 second interval

# sar - system activity report
sar -u 1 5       # CPU
sar -r 1 5       # Memory
sar -d 1 5       # Disk
sar -n DEV 1 5   # Network

# free - memory usage
free -h

# uptime - load average
uptime
# load average: 1.50, 1.20, 0.80  (1min, 5min, 15min)

1.3 sysctl Basics¶

# View current settings
sysctl -a                    # All settings
sysctl vm.swappiness         # Specific setting
cat /proc/sys/vm/swappiness  # Direct read

# Temporary change
sysctl -w vm.swappiness=10
# Or
echo 10 > /proc/sys/vm/swappiness

# Persistent configuration
# /etc/sysctl.conf or /etc/sysctl.d/*.conf
echo "vm.swappiness = 10" >> /etc/sysctl.d/99-custom.conf
sysctl -p /etc/sysctl.d/99-custom.conf  # Apply
sysctl --system  # Load all configuration files

2. CPU Tuning¶

2.1 CPU Information¶

# CPU information
lscpu
cat /proc/cpuinfo

# CPU frequency
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
cpupower frequency-info

# NUMA information
numactl --hardware
lscpu | grep NUMA

2.2 CPU Governor¶

# Check current governor
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
# performance, powersave, userspace, ondemand, conservative, schedutil

# Change governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Using cpupower
cpupower frequency-set -g performance

# Persistent configuration (Ubuntu)
# /etc/default/cpufrequtils
GOVERNOR="performance"

2.3 Process Priority¶

# nice value (-20 to 19, lower is higher priority)
nice -n -10 ./high-priority-task
renice -n -10 -p <PID>

# Real-time scheduling
chrt -f 50 ./realtime-task  # FIFO, priority 50
chrt -r 50 ./realtime-task  # Round Robin

# CPU affinity
taskset -c 0,1 ./my-program  # Run on CPU 0, 1 only
taskset -cp 0-3 <PID>        # Change running process

# CPU limit with cgroups
# /sys/fs/cgroup/cpu/mygroup/
mkdir /sys/fs/cgroup/cpu/mygroup
echo 50000 > /sys/fs/cgroup/cpu/mygroup/cpu.cfs_quota_us  # 50% limit
echo <PID> > /sys/fs/cgroup/cpu/mygroup/cgroup.procs

# /etc/sysctl.d/99-cpu.conf

# Scheduler tuning
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
kernel.sched_migration_cost_ns = 5000000

# Workload-specific optimization
# Server workload (throughput-oriented)
kernel.sched_autogroup_enabled = 0

# Desktop workload (responsiveness-oriented)
kernel.sched_autogroup_enabled = 1

3. Memory Tuning¶

3.1 Memory Information¶

# Memory usage
free -h
cat /proc/meminfo

# Per-process memory
ps aux --sort=-%mem | head
pmap -x <PID>

# Page cache status
cat /proc/meminfo | grep -E "Cached|Buffers|Dirty"

# NUMA memory
numastat

3.2 Swap Tuning¶

# swappiness (0-100, lower uses less swap)
sysctl -w vm.swappiness=10  # Server: 10, Desktop: 60

# Create swap file
dd if=/dev/zero of=/swapfile bs=1G count=4
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Add to /etc/fstab
# /swapfile none swap sw 0 0

# Swap status
swapon --show
cat /proc/swaps

# /etc/sysctl.d/99-memory.conf

# Reduce swap usage
vm.swappiness = 10

# Dirty page ratio (write delay)
vm.dirty_ratio = 20              # Allow up to 20% of total memory dirty
vm.dirty_background_ratio = 5    # Start background flush at 5%

# Or absolute values
vm.dirty_bytes = 1073741824      # 1GB
vm.dirty_background_bytes = 268435456  # 256MB

# Cache pressure
vm.vfs_cache_pressure = 50       # Default 100, lower keeps cache

# OOM Killer tuning
vm.overcommit_memory = 0         # 0=heuristic, 1=always allow, 2=limit
vm.overcommit_ratio = 50         # Used when overcommit_memory=2

# Memory compaction
vm.compaction_proactiveness = 20

# Transparent Huge Pages
# /sys/kernel/mm/transparent_hugepage/enabled
# [always] madvise never

3.4 Cache Management¶

# Clear page cache (use with caution in production!)
sync
echo 1 > /proc/sys/vm/drop_caches  # Page cache
echo 2 > /proc/sys/vm/drop_caches  # dentries, inodes
echo 3 > /proc/sys/vm/drop_caches  # All

# Check file cache
vmtouch -v /path/to/file
fincore /path/to/file

# Per-process cache usage
cat /proc/<PID>/smaps | grep -E "^(Rss|Shared|Private)"

4. I/O Tuning¶

4.1 I/O Scheduler¶

# Check current scheduler
cat /sys/block/sda/queue/scheduler
# [mq-deadline] kyber bfq none

# Scheduler types
# - none: For NVMe SSD (NOOP)
# - mq-deadline: Deadline-based, server default
# - bfq: Budget Fair Queueing, desktop
# - kyber: For fast devices

# Change scheduler
echo mq-deadline > /sys/block/sda/queue/scheduler

# Persistent configuration (GRUB)
# /etc/default/grub
# GRUB_CMDLINE_LINUX="elevator=mq-deadline"
# update-grub

# Set via udev rules
# /etc/udev/rules.d/60-scheduler.rules
# ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/scheduler}="mq-deadline"
# ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"

4.2 Disk I/O Tuning¶

# Readahead
cat /sys/block/sda/queue/read_ahead_kb  # Default 128
echo 256 > /sys/block/sda/queue/read_ahead_kb

# Queue depth
cat /sys/block/sda/queue/nr_requests
echo 256 > /sys/block/sda/queue/nr_requests

# Maximum sectors
cat /sys/block/sda/queue/max_sectors_kb

# Enable SSD TRIM
fstrim -v /
# Or automatic TRIM (mount option: discard)
# /dev/sda1 / ext4 defaults,discard 0 1

# Periodic TRIM (recommended)
systemctl enable fstrim.timer

4.3 Filesystem Tuning¶

# ext4 mount options
# /etc/fstab
# noatime    - Don't update access time (performance gain)
# nodiratime - Don't update directory access time
# data=writeback - Journaling mode (risky but fast)
# barrier=0  - Disable write barrier (risky)
# commit=60  - Commit interval (seconds)

# XFS tuning
# logbufs=8 - Number of log buffers
# logbsize=256k - Log buffer size

# Filesystem information
tune2fs -l /dev/sda1  # ext4
xfs_info /dev/sda1    # XFS

4.4 I/O Priority¶

# ionice - I/O priority
ionice -c 3 command        # Idle
ionice -c 2 -n 0 command   # Best-effort, high priority
ionice -c 1 command        # Realtime (root only)

# Change running process
ionice -c 2 -n 7 -p <PID>  # Lower priority

# Check current I/O priority
ionice -p <PID>

5. Network Tuning¶

5.1 Network Information¶

# Interface information
ip link show
ethtool eth0

# Network statistics
ss -s
netstat -s
cat /proc/net/netstat

# Connection status
ss -tuln   # Listening ports
ss -tupn   # All connections
conntrack -L  # Connection tracking table

5.2 TCP Tuning¶

# /etc/sysctl.d/99-network.conf

# TCP buffer sizes
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 1048576
net.core.wmem_default = 1048576

# TCP socket buffer (min, default, max)
net.ipv4.tcp_rmem = 4096 1048576 16777216
net.ipv4.tcp_wmem = 4096 1048576 16777216

# TCP backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

# TIME_WAIT optimization
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1

# TCP Keepalive
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3

# TCP congestion control
net.ipv4.tcp_congestion_control = bbr  # Or cubic
net.core.default_qdisc = fq

# Port range
net.ipv4.ip_local_port_range = 1024 65535

# SYN cookies (SYN flood defense)
net.ipv4.tcp_syncookies = 1

5.3 High-Performance Web Server Configuration¶

# /etc/sysctl.d/99-webserver.conf

# File handle limits
fs.file-max = 2097152
fs.nr_open = 2097152

# Network stack
net.core.somaxconn = 65535
net.ipv4.tcp_max_tw_buckets = 2000000
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535

# Buffers
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 12582912 16777216
net.ipv4.tcp_wmem = 4096 12582912 16777216

# TCP optimization
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_mtu_probing = 1

# BBR
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq

5.4 Connection Limits¶

# System limits
ulimit -n        # Current limit
ulimit -n 65535  # Change

# /etc/security/limits.conf
# * soft nofile 65535
# * hard nofile 65535

# systemd service limits
# [Service]
# LimitNOFILE=65535

6. Profiling Tools¶

6.1 perf Basics¶

# Install perf
apt install linux-tools-common linux-tools-$(uname -r)

# CPU profiling
perf stat ./my-program
perf stat -d ./my-program  # Detailed

# Sampling
perf record -g ./my-program
perf record -g -p <PID> -- sleep 30

# Analyze results
perf report
perf report --stdio

# Real-time monitoring
perf top
perf top -p <PID>

# System-wide
perf record -a -g -- sleep 10

6.2 Flamegraph¶

# Install FlameGraph tools
git clone https://github.com/brendangregg/FlameGraph

# Collect data with perf
perf record -g -p <PID> -- sleep 60

# Generate flamegraph
perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > flame.svg

# Or all at once
perf record -F 99 -a -g -- sleep 60
perf script | \
  ./FlameGraph/stackcollapse-perf.pl | \
  ./FlameGraph/flamegraph.pl > flame.svg

6.3 strace/ltrace¶

# System call tracing
strace ./my-program
strace -p <PID>

# Specific system calls only
strace -e open,read,write ./my-program

# Time measurement
strace -T ./my-program    # Time per syscall
strace -c ./my-program    # Summary statistics

# Library call tracing
ltrace ./my-program

6.4 Other Tools¶

# bpftrace - eBPF-based tracing
bpftrace -e 'tracepoint:syscalls:sys_enter_open { printf("%s %s\n", comm, str(args->filename)); }'

# Memory profiling (Valgrind)
valgrind --tool=massif ./my-program
ms_print massif.out.*

# CPU profiling (Valgrind)
valgrind --tool=callgrind ./my-program
kcachegrind callgrind.out.*

# Benchmarking
stress-ng --cpu 4 --timeout 60s
fio --name=random-write --ioengine=libaio --iodepth=32 --rw=randwrite --bs=4k --direct=1 --size=1G --numjobs=4 --runtime=60

6.5 Performance Checklist¶

#!/bin/bash
# performance-check.sh

echo "=== System Information ==="
uname -a
uptime

echo -e "\n=== CPU ==="
lscpu | grep -E "^(CPU\(s\)|Thread|Core|Model name)"
mpstat 1 1

echo -e "\n=== Memory ==="
free -h
cat /proc/meminfo | grep -E "^(MemTotal|MemFree|Buffers|Cached|SwapTotal|SwapFree)"

echo -e "\n=== Disk I/O ==="
iostat -x 1 1

echo -e "\n=== Network ==="
ss -s
cat /proc/net/netstat | grep -E "^(Tcp|Udp)"

echo -e "\n=== Load Average ==="
cat /proc/loadavg

echo -e "\n=== Top Processes (CPU) ==="
ps aux --sort=-%cpu | head -5

echo -e "\n=== Top Processes (Memory) ==="
ps aux --sort=-%mem | head -5

echo -e "\n=== Open Files ==="
cat /proc/sys/fs/file-nr

echo -e "\n=== Network Connections ==="
ss -s

7. Practice Exercises¶

Exercise 1: Web Server Tuning¶

# Requirements:
# 1. Support 100,000 concurrent connections
# 2. TCP optimization (BBR, keepalive)
# 3. Increase file handle limits
# 4. Choose appropriate I/O scheduler

# Write sysctl configuration:

Exercise 2: Database Server Tuning¶

# Requirements:
# 1. Memory optimization (low swappiness)
# 2. Disk I/O optimization
# 3. Dirty page management
# 4. CPU affinity configuration

# Write configuration and commands:

Exercise 3: Performance Problem Diagnosis¶

# Scenario:
# List items to check sequentially when server becomes slow

# Diagnostic command list:

Exercise 4: Flamegraph Analysis¶

# Requirements:
# 1. Write or select CPU-intensive program
# 2. Profile with perf
# 3. Generate flamegraph
# 4. Analyze bottlenecks

# Commands and analysis approach:

Next Steps¶

15_Container_Internals - cgroups, namespaces
16_Storage_Management - LVM, RAID
Brendan Gregg's Blog

References¶

← Previous: Advanced systemd | Next: Container Internals → | Table of Contents

14. Linux Performance Tuning¶

Learning Objectives¶

Table of Contents¶

1. Performance Analysis Fundamentals¶

1.1 USE Methodology¶

1.2 Basic Monitoring Tools¶

1.3 sysctl Basics¶

2. CPU Tuning¶

2.1 CPU Information¶

2.2 CPU Governor¶

2.3 Process Priority¶

2.4 CPU-related sysctl¶

3. Memory Tuning¶

3.1 Memory Information¶

3.2 Swap Tuning¶

3.3 Memory-related sysctl¶

3.4 Cache Management¶

4. I/O Tuning¶

4.1 I/O Scheduler¶

4.2 Disk I/O Tuning¶

4.3 Filesystem Tuning¶

4.4 I/O Priority¶

5. Network Tuning¶

5.1 Network Information¶

5.2 TCP Tuning¶

5.3 High-Performance Web Server Configuration¶

5.4 Connection Limits¶

6. Profiling Tools¶

6.1 perf Basics¶

6.2 Flamegraph¶

6.3 strace/ltrace¶

6.4 Other Tools¶

6.5 Performance Checklist¶

7. Practice Exercises¶

Exercise 1: Web Server Tuning¶

Exercise 2: Database Server Tuning¶

Exercise 3: Performance Problem Diagnosis¶

Exercise 4: Flamegraph Analysis¶

Next Steps¶

References¶