06. Optimization Fundamentals

06. Optimization Fundamentals

Learning Objectives

  • Understand the mathematical formulation of optimization problems and types of constraints
  • Learn the definitions and properties of convex sets and convex functions, and understand how convexity affects optimization
  • Express necessary and sufficient conditions for extrema using gradients and Hessians, and apply them
  • Solve equality-constrained optimization problems using the Lagrange multiplier method
  • Understand KKT conditions and apply them to inequality-constrained optimization problems
  • Learn how optimization theory is utilized in machine learning through practical examples

1. Formulation of Optimization Problems

1.1 General Form of Optimization Problems

Optimization is the process of finding variable values that minimize (or maximize) an objective function under given constraints.

Standard form:

$$ \begin{align} \min_{x \in \mathbb{R}^n} \quad & f(x) \\ \text{subject to} \quad & h_i(x) = 0, \quad i = 1, \ldots, m \\ & g_j(x) \leq 0, \quad j = 1, \ldots, p \end{align} $$

  • $f(x)$: objective function
  • $h_i(x) = 0$: equality constraints
  • $g_j(x) \leq 0$: inequality constraints

1.2 Feasible Region

The feasible region $\mathcal{F}$ is the set of all points that satisfy all constraints:

$$ \mathcal{F} = \{x \in \mathbb{R}^n : h_i(x) = 0, \; g_j(x) \leq 0, \; \forall i, j\} $$

Example: 2D Constrained Optimization

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
from scipy.optimize import minimize

# ๋ชฉ์  ํ•จ์ˆ˜: f(x,y) = (x-1)^2 + (y-2)^2
def objective(X):
    x, y = X
    return (x - 1)**2 + (y - 2)**2

# ๋“ฑ์‹ ์ œ์•ฝ: h(x,y) = x + y - 1 = 0
def equality_constraint(X):
    x, y = X
    return x + y - 1

# ๋ถ€๋“ฑ์‹ ์ œ์•ฝ: g(x,y) = x^2 + y^2 - 4 <= 0 (์› ์•ˆ์ชฝ)
def inequality_constraint(X):
    x, y = X
    return x**2 + y**2 - 4

# ์‹œ๊ฐํ™”
fig, ax = plt.subplots(figsize=(10, 8))

# ๋ชฉ์  ํ•จ์ˆ˜ ๋“ฑ๊ณ ์„ 
x = np.linspace(-2.5, 3, 300)
y = np.linspace(-2.5, 3, 300)
X, Y = np.meshgrid(x, y)
Z = (X - 1)**2 + (Y - 2)**2
contour = ax.contour(X, Y, Z, levels=20, alpha=0.6, cmap='viridis')
ax.clabel(contour, inline=True, fontsize=8)

# ๋“ฑ์‹ ์ œ์•ฝ (์ง์„ )
x_line = np.linspace(-2.5, 3, 100)
y_line = 1 - x_line
ax.plot(x_line, y_line, 'r-', linewidth=2, label='$x + y = 1$ (๋“ฑ์‹ ์ œ์•ฝ)')

# ๋ถ€๋“ฑ์‹ ์ œ์•ฝ (์›)
circle = Circle((0, 0), 2, fill=False, edgecolor='blue', linewidth=2,
                label='$x^2 + y^2 \leq 4$ (๋ถ€๋“ฑ์‹ ์ œ์•ฝ)')
ax.add_patch(circle)

# ์‹คํ˜„ ๊ฐ€๋Šฅ ์˜์—ญ ๊ฐ•์กฐ
theta = np.linspace(0, 2*np.pi, 100)
x_circle = 2 * np.cos(theta)
y_circle = 2 * np.sin(theta)
mask = (x_circle + y_circle >= 0.9) & (x_circle + y_circle <= 1.1)
ax.fill_between(x_circle[mask], -2.5, y_circle[mask], alpha=0.2, color='green')

ax.set_xlim(-2.5, 3)
ax.set_ylim(-2.5, 3)
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('์ œ์•ฝ ์ตœ์ ํ™” ๋ฌธ์ œ: ์‹คํ˜„ ๊ฐ€๋Šฅ ์˜์—ญ๊ณผ ๋ชฉ์  ํ•จ์ˆ˜', fontsize=14)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
plt.tight_layout()
plt.savefig('optimization_feasible_region.png', dpi=150)
plt.show()

print("์‹คํ˜„ ๊ฐ€๋Šฅ ์˜์—ญ: ์ง์„ ๊ณผ ์›์˜ ๊ต์„  ์ค‘ ์ œ์•ฝ์„ ๋งŒ์กฑํ•˜๋Š” ๋ถ€๋ถ„")

1.3 Classification of Optimization Problems

  • Unconstrained Optimization: $\min f(x)$
  • Constrained Optimization: includes equality/inequality constraints
  • Linear Programming: $f$, $h$, $g$ are all linear
  • Convex Optimization: $f$, $g$ convex, $h$ affine
  • Nonlinear Optimization: general nonlinear functions

2. Convex Sets and Convex Functions

2.1 Convex Set

A set $C \subseteq \mathbb{R}^n$ is convex if the line segment connecting any two points in the set is entirely contained within the set:

$$ x, y \in C, \; \lambda \in [0, 1] \implies \lambda x + (1-\lambda)y \in C $$

Example: Convex vs Non-convex Sets

import numpy as np
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# 1. ์› (๋ณผ๋ก)
theta = np.linspace(0, 2*np.pi, 100)
x1 = np.cos(theta)
y1 = np.sin(theta)
axes[0].fill(x1, y1, alpha=0.3, color='green')
axes[0].plot([0.5, -0.5], [0.3, -0.7], 'ro-', linewidth=2, markersize=8)
axes[0].set_title('์› (๋ณผ๋ก)', fontsize=12)
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)

# 2. ๋ฐ˜ํ‰๋ฉด (๋ณผ๋ก)
x2 = np.array([-2, 2, 2, -2])
y2 = np.array([0, 0, 2, 2])
axes[1].fill(x2, y2, alpha=0.3, color='green')
axes[1].plot([0.5, -0.5], [0.5, 1.5], 'ro-', linewidth=2, markersize=8)
axes[1].set_title('๋ฐ˜ํ‰๋ฉด $y \geq 0$ (๋ณผ๋ก)', fontsize=12)
axes[1].set_xlim(-2, 2)
axes[1].set_ylim(-2, 2)
axes[1].grid(True, alpha=0.3)

# 3. ๋‹ค๊ฐํ˜• (๋ณผ๋ก)
x3 = np.array([0, 1, 0.5, -0.5, -1])
y3 = np.array([1, 0, -1, -1, 0])
axes[2].fill(x3, y3, alpha=0.3, color='green')
axes[2].plot([0.5, -0.5], [0.3, -0.3], 'ro-', linewidth=2, markersize=8)
axes[2].set_title('๋ณผ๋ก ๋‹ค๊ฐํ˜• (๋ณผ๋ก)', fontsize=12)
axes[2].set_aspect('equal')
axes[2].grid(True, alpha=0.3)

# 4. ์ดˆ์Šน๋‹ฌ (๋น„๋ณผ๋ก)
theta = np.linspace(0, 2*np.pi, 100)
x4_outer = 1.5 * np.cos(theta)
y4_outer = 1.5 * np.sin(theta)
x4_inner = 1.0 * np.cos(theta) + 0.5
y4_inner = 1.0 * np.sin(theta)
axes[3].fill(x4_outer, y4_outer, alpha=0.3, color='red')
axes[3].fill(x4_inner, y4_inner, alpha=1.0, color='white')
axes[3].plot([-0.5, 1.0], [0.5, -0.5], 'bo--', linewidth=2, markersize=8)
axes[3].set_title('์ดˆ์Šน๋‹ฌ (๋น„๋ณผ๋ก)', fontsize=12)
axes[3].set_aspect('equal')
axes[3].grid(True, alpha=0.3)

# 5. ๋ณ„ ๋ชจ์–‘ (๋น„๋ณผ๋ก)
theta = np.linspace(0, 2*np.pi, 11)
r = np.where(np.arange(11) % 2 == 0, 1.5, 0.7)
x5 = r * np.cos(theta)
y5 = r * np.sin(theta)
axes[4].fill(x5, y5, alpha=0.3, color='red')
axes[4].plot([1.0, -1.0], [0.0, 0.0], 'bo--', linewidth=2, markersize=8)
axes[4].set_title('๋ณ„ (๋น„๋ณผ๋ก)', fontsize=12)
axes[4].set_aspect('equal')
axes[4].grid(True, alpha=0.3)

# 6. ๊ณ ๋ฆฌ (๋น„๋ณผ๋ก)
theta = np.linspace(0, 2*np.pi, 100)
x6_outer = 1.5 * np.cos(theta)
y6_outer = 1.5 * np.sin(theta)
x6_inner = 0.7 * np.cos(theta)
y6_inner = 0.7 * np.sin(theta)
axes[5].fill(x6_outer, y6_outer, alpha=0.3, color='red')
axes[5].fill(x6_inner, y6_inner, alpha=1.0, color='white')
axes[5].plot([-1.0, 1.0], [0.0, 0.0], 'bo--', linewidth=2, markersize=8)
axes[5].set_title('๊ณ ๋ฆฌ (๋น„๋ณผ๋ก)', fontsize=12)
axes[5].set_aspect('equal')
axes[5].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('convex_vs_nonconvex_sets.png', dpi=150)
plt.show()

print("๋ณผ๋ก ์ง‘ํ•ฉ(์ดˆ๋ก): ๋‘ ์ ์„ ์ž‡๋Š” ์„ ๋ถ„์ด ์ง‘ํ•ฉ ๋‚ด๋ถ€์— ์žˆ์Œ")
print("๋น„๋ณผ๋ก ์ง‘ํ•ฉ(๋นจ๊ฐ•): ์„ ๋ถ„์ด ์ง‘ํ•ฉ ๋ฐ–์œผ๋กœ ๋‚˜๊ฐ")

2.2 Convex Function

A function $f: \mathbb{R}^n \to \mathbb{R}$ is convex if its domain is a convex set and it satisfies:

$$ f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda)f(y), \quad \forall x, y \in \text{dom}(f), \; \lambda \in [0,1] $$

Geometric interpretation: The line segment connecting any two points on the function graph always lies above the graph.

Strictly Convex: inequality becomes $<$

Strongly Convex: $\exists m > 0$ such that

$$ f(x) - \frac{m}{2}\|x\|^2 \text{ is convex} $$

2.3 Criteria for Convexity

First-order condition (if differentiable):

$$ f(y) \geq f(x) + \nabla f(x)^T (y - x), \quad \forall x, y $$

Second-order condition (if twice differentiable):

$$ \nabla^2 f(x) \succeq 0 \quad \text{(Hessian is positive semidefinite)} $$

Strongly convex: $\nabla^2 f(x) \succ 0$ (Hessian is positive definite)

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# ๋ณผ๋ก ํ•จ์ˆ˜ ์˜ˆ์ œ
def convex_func(x, y):
    """๋ณผ๋ก: f(x,y) = x^2 + y^2"""
    return x**2 + y**2

def nonconvex_func(x, y):
    """๋น„๋ณผ๋ก: f(x,y) = x^2 - y^2 (์•ˆ์žฅ์ )"""
    return x**2 - y**2

def visualize_function(func, title, ax):
    x = np.linspace(-2, 2, 100)
    y = np.linspace(-2, 2, 100)
    X, Y = np.meshgrid(x, y)
    Z = func(X, Y)

    ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8, edgecolor='none')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('f(x,y)')
    ax.set_title(title)

fig = plt.figure(figsize=(15, 6))

# ๋ณผ๋ก ํ•จ์ˆ˜
ax1 = fig.add_subplot(121, projection='3d')
visualize_function(convex_func, '๋ณผ๋ก ํ•จ์ˆ˜: $f(x,y) = x^2 + y^2$', ax1)

# ๋น„๋ณผ๋ก ํ•จ์ˆ˜
ax2 = fig.add_subplot(122, projection='3d')
visualize_function(nonconvex_func, '๋น„๋ณผ๋ก ํ•จ์ˆ˜: $f(x,y) = x^2 - y^2$', ax2)

plt.tight_layout()
plt.savefig('convex_vs_nonconvex_functions.png', dpi=150)
plt.show()

# ํ—ค์‹œ์•ˆ ํ™•์ธ
def check_convexity(x, y, name):
    if name == "convex":
        # f(x,y) = x^2 + y^2
        # โˆ‡ยฒf = [[2, 0], [0, 2]]
        hessian = np.array([[2, 0], [0, 2]])
    else:
        # f(x,y) = x^2 - y^2
        # โˆ‡ยฒf = [[2, 0], [0, -2]]
        hessian = np.array([[2, 0], [0, -2]])

    eigenvalues = np.linalg.eigvals(hessian)
    print(f"\n{name} ํ•จ์ˆ˜์˜ ํ—ค์‹œ์•ˆ ๊ณ ์œ ๊ฐ’: {eigenvalues}")

    if np.all(eigenvalues >= 0):
        print("  โ†’ ๋ชจ๋“  ๊ณ ์œ ๊ฐ’ โ‰ฅ 0: ๋ณผ๋ก ํ•จ์ˆ˜")
    elif np.all(eigenvalues > 0):
        print("  โ†’ ๋ชจ๋“  ๊ณ ์œ ๊ฐ’ > 0: ๊ฐ•๋ณผ๋ก ํ•จ์ˆ˜")
    else:
        print("  โ†’ ์Œ์˜ ๊ณ ์œ ๊ฐ’ ์กด์žฌ: ๋น„๋ณผ๋ก ํ•จ์ˆ˜")

check_convexity(0, 0, "convex")
check_convexity(0, 0, "nonconvex")

2.4 Important Properties of Convex Functions

Theorem 1: A local minimum of a convex function is a global minimum.

Theorem 2: A strongly convex function has a unique global minimum.

Theorem 3: The sum, positive scalar multiple, and maximum of convex functions are convex.

Example: Convex Functions in Machine Learning

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-3, 3, 300)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# 1. ์ด์ฐจ ํ•จ์ˆ˜ (Quadratic)
axes[0, 0].plot(x, x**2, linewidth=2)
axes[0, 0].set_title('$f(x) = x^2$ (Ridge ์ •๊ทœํ™”)', fontsize=12)
axes[0, 0].grid(True, alpha=0.3)

# 2. ์ ˆ๋Œ“๊ฐ’ (L1 norm)
axes[0, 1].plot(x, np.abs(x), linewidth=2, color='orange')
axes[0, 1].set_title('$f(x) = |x|$ (Lasso ์ •๊ทœํ™”)', fontsize=12)
axes[0, 1].grid(True, alpha=0.3)

# 3. ์ง€์ˆ˜ ํ•จ์ˆ˜
axes[0, 2].plot(x, np.exp(x), linewidth=2, color='green')
axes[0, 2].set_title('$f(x) = e^x$', fontsize=12)
axes[0, 2].set_ylim(0, 10)
axes[0, 2].grid(True, alpha=0.3)

# 4. ๋กœ์ง€์Šคํ‹ฑ ์†์‹ค
axes[1, 0].plot(x, np.log(1 + np.exp(x)), linewidth=2, color='red')
axes[1, 0].set_title('$f(x) = \log(1 + e^x)$ (๋กœ์ง€์Šคํ‹ฑ ์†์‹ค)', fontsize=12)
axes[1, 0].grid(True, alpha=0.3)

# 5. Huber ์†์‹ค
delta = 1.0
huber = np.where(np.abs(x) <= delta, 0.5 * x**2, delta * (np.abs(x) - 0.5 * delta))
axes[1, 1].plot(x, huber, linewidth=2, color='purple')
axes[1, 1].set_title('Huber ์†์‹ค ($\delta=1$)', fontsize=12)
axes[1, 1].grid(True, alpha=0.3)

# 6. -log (์Œ์˜ ๋กœ๊ทธ, ๋ณผ๋ก)
x_pos = np.linspace(0.01, 3, 300)
axes[1, 2].plot(x_pos, -np.log(x_pos), linewidth=2, color='brown')
axes[1, 2].set_title('$f(x) = -\log(x)$ (x > 0)', fontsize=12)
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('ml_convex_functions.png', dpi=150)
plt.show()

print("๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ๋ณผ๋ก ํ•จ์ˆ˜๋“ค:")
print("- ์ด์ฐจ ํ•จ์ˆ˜: Ridge ์ •๊ทœํ™”, MSE ์†์‹ค")
print("- L1 ๋…ธ๋ฆ„: Lasso ์ •๊ทœํ™”")
print("- ๋กœ์ง€์Šคํ‹ฑ ์†์‹ค: ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€")
print("- Huber ์†์‹ค: ๋กœ๋ฒ„์ŠคํŠธ ํšŒ๊ท€")

3. Necessary and Sufficient Conditions for Extrema

3.1 First-Order Necessary Condition

For unconstrained optimization, if $x^*$ is a local minimum:

$$ \nabla f(x^*) = 0 $$

Such a point is called a stationary point or critical point.

3.2 Second-Order Sufficient Condition

When $x^*$ is a stationary point: - Local minimum: $\nabla^2 f(x^*) \succ 0$ (Hessian is positive definite) - Local maximum: $\nabla^2 f(x^*) \prec 0$ (Hessian is negative definite) - Saddle Point: Hessian is indefinite (has both positive and negative eigenvalues)

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from mpl_toolkits.mplot3d import Axes3D

def rosenbrock(X):
    """Rosenbrock ํ•จ์ˆ˜: f(x,y) = (1-x)^2 + 100(y-x^2)^2"""
    x, y = X
    return (1 - x)**2 + 100 * (y - x**2)**2

def rosenbrock_gradient(X):
    x, y = X
    df_dx = -2*(1 - x) - 400*x*(y - x**2)
    df_dy = 200*(y - x**2)
    return np.array([df_dx, df_dy])

def rosenbrock_hessian(X):
    x, y = X
    d2f_dx2 = 2 - 400*y + 1200*x**2
    d2f_dxdy = -400*x
    d2f_dy2 = 200
    return np.array([[d2f_dx2, d2f_dxdy],
                     [d2f_dxdy, d2f_dy2]])

# ์ตœ์†Ÿ๊ฐ’ ์ฐพ๊ธฐ
x0 = np.array([0.0, 0.0])
result = minimize(rosenbrock, x0, method='BFGS', jac=rosenbrock_gradient)
x_min = result.x

print("Rosenbrock ํ•จ์ˆ˜ ์ตœ์ ํ™”")
print(f"์ตœ์†Ÿ๊ฐ’ ์œ„์น˜: x* = {x_min}")
print(f"์ตœ์†Ÿ๊ฐ’: f(x*) = {rosenbrock(x_min):.6f}")
print(f"\n๊ทธ๋ž˜๋””์–ธํŠธ: โˆ‡f(x*) = {rosenbrock_gradient(x_min)}")

hessian = rosenbrock_hessian(x_min)
eigenvalues = np.linalg.eigvals(hessian)
print(f"\nํ—ค์‹œ์•ˆ:\n{hessian}")
print(f"๊ณ ์œ ๊ฐ’: {eigenvalues}")
print(f"  โ†’ ๋ชจ๋“  ๊ณ ์œ ๊ฐ’ > 0: ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’ ํ™•์ธ")

# ์‹œ๊ฐํ™”
fig = plt.figure(figsize=(16, 6))

# 3D ํ‘œ๋ฉด
ax1 = fig.add_subplot(131, projection='3d')
x = np.linspace(-2, 2, 100)
y = np.linspace(-1, 3, 100)
X, Y = np.meshgrid(x, y)
Z = (1 - X)**2 + 100*(Y - X**2)**2
ax1.plot_surface(X, Y, Z, cmap='viridis', alpha=0.6, edgecolor='none')
ax1.scatter([x_min[0]], [x_min[1]], [rosenbrock(x_min)],
            color='red', s=100, label='์ตœ์†Ÿ๊ฐ’')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_zlabel('f(x,y)')
ax1.set_title('Rosenbrock ํ•จ์ˆ˜', fontsize=12)
ax1.legend()

# ๋“ฑ๊ณ ์„ 
ax2 = fig.add_subplot(132)
levels = np.logspace(-1, 3, 20)
contour = ax2.contour(X, Y, Z, levels=levels, cmap='viridis')
ax2.clabel(contour, inline=True, fontsize=8)
ax2.plot(x_min[0], x_min[1], 'r*', markersize=20, label='์ตœ์†Ÿ๊ฐ’ (1, 1)')
ax2.set_xlabel('x')
ax2.set_ylabel('y')
ax2.set_title('๋“ฑ๊ณ ์„  ํ”Œ๋กฏ', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)

# ์•ˆ์žฅ์  ์˜ˆ์ œ: f(x,y) = x^2 - y^2
ax3 = fig.add_subplot(133, projection='3d')
Z_saddle = X**2 - Y**2
ax3.plot_surface(X, Y, Z_saddle, cmap='coolwarm', alpha=0.8, edgecolor='none')
ax3.scatter([0], [0], [0], color='red', s=100, label='์•ˆ์žฅ์  (0, 0)')
ax3.set_xlabel('x')
ax3.set_ylabel('y')
ax3.set_zlabel('f(x,y)')
ax3.set_title('์•ˆ์žฅ์ : $f(x,y) = x^2 - y^2$', fontsize=12)
ax3.legend()

plt.tight_layout()
plt.savefig('critical_points_analysis.png', dpi=150)
plt.show()

# ์•ˆ์žฅ์ ์—์„œ์˜ ํ—ค์‹œ์•ˆ
print("\n\n์•ˆ์žฅ์  ์˜ˆ์ œ: f(x,y) = x^2 - y^2")
saddle_hessian = np.array([[2, 0], [0, -2]])
saddle_eigenvalues = np.linalg.eigvals(saddle_hessian)
print(f"์›์  (0, 0)์—์„œ์˜ ํ—ค์‹œ์•ˆ:\n{saddle_hessian}")
print(f"๊ณ ์œ ๊ฐ’: {saddle_eigenvalues}")
print(f"  โ†’ ์–‘์˜ ๊ณ ์œ ๊ฐ’๊ณผ ์Œ์˜ ๊ณ ์œ ๊ฐ’ ํ˜ผ์žฌ: ์•ˆ์žฅ์ ")

4. Lagrange Multipliers

4.1 Equality-Constrained Optimization

Consider the following problem:

$$ \begin{align} \min_{x} \quad & f(x) \\ \text{subject to} \quad & h(x) = 0 \end{align} $$

Lagrangian:

$$ \mathcal{L}(x, \lambda) = f(x) + \lambda^T h(x) $$

where $\lambda$ is the Lagrange multiplier.

Necessary condition for optimality:

$$ \begin{align} \nabla_x \mathcal{L} &= \nabla f(x^*) + \lambda^* \nabla h(x^*) = 0 \\ \nabla_\lambda \mathcal{L} &= h(x^*) = 0 \end{align} $$

Geometric interpretation: At the optimal point, $\nabla f$ and $\nabla h$ are parallel (same or opposite direction).

4.2 Example: Equality-Constrained Optimization

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# ๋ชฉ์  ํ•จ์ˆ˜: f(x,y) = x^2 + y^2
def objective(X):
    return X[0]**2 + X[1]**2

# ๋“ฑ์‹ ์ œ์•ฝ: h(x,y) = x + 2y - 1 = 0
def constraint_eq(X):
    return X[0] + 2*X[1] - 1

# ๋ผ๊ทธ๋ž‘์ง€์•ˆ: L(x,y,ฮป) = x^2 + y^2 + ฮป(x + 2y - 1)
def lagrangian(X, lam):
    return objective(X) + lam * constraint_eq(X)

# scipy๋กœ ์ตœ์ ํ™”
constraints = {'type': 'eq', 'fun': constraint_eq}
x0 = np.array([0.0, 0.0])
result = minimize(objective, x0, method='SLSQP', constraints=constraints)

x_opt = result.x
print("๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ• ์˜ˆ์ œ")
print(f"์ตœ์  ํ•ด: x* = {x_opt}")
print(f"๋ชฉ์  ํ•จ์ˆ˜ ๊ฐ’: f(x*) = {objective(x_opt):.6f}")
print(f"์ œ์•ฝ ์กฐ๊ฑด ํ™•์ธ: h(x*) = {constraint_eq(x_opt):.6e}")

# ํ•ด์„์  ํ•ด (์†์œผ๋กœ ๊ณ„์‚ฐ)
# โˆ‡L = 0: 2x + ฮป = 0, 2y + 2ฮป = 0, x + 2y - 1 = 0
# โ†’ x = -ฮป/2, y = -ฮป, x + 2y = 1 โ†’ -ฮป/2 - 2ฮป = 1 โ†’ ฮป = -2/5
# โ†’ x = 1/5, y = 2/5
x_analytical = np.array([1/5, 2/5])
lambda_analytical = -2/5

print(f"\nํ•ด์„์  ํ•ด: x* = {x_analytical}")
print(f"๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜: ฮป* = {lambda_analytical}")

# ์‹œ๊ฐํ™”
fig, ax = plt.subplots(figsize=(10, 8))

# ๋ชฉ์  ํ•จ์ˆ˜ ๋“ฑ๊ณ ์„ 
x = np.linspace(-1, 2, 300)
y = np.linspace(-1, 2, 300)
X, Y = np.meshgrid(x, y)
Z = X**2 + Y**2
contour = ax.contour(X, Y, Z, levels=15, alpha=0.6, cmap='viridis')
ax.clabel(contour, inline=True, fontsize=8)

# ์ œ์•ฝ ์กฐ๊ฑด (์ง์„ )
x_line = np.linspace(-1, 2, 100)
y_line = (1 - x_line) / 2
ax.plot(x_line, y_line, 'r-', linewidth=3, label='์ œ์•ฝ: $x + 2y = 1$')

# ์ตœ์ ์ 
ax.plot(x_opt[0], x_opt[1], 'r*', markersize=20, label=f'์ตœ์ ์  ({x_opt[0]:.2f}, {x_opt[1]:.2f})')

# ๊ทธ๋ž˜๋””์–ธํŠธ ๋ฒกํ„ฐ
grad_f = 2 * x_opt  # โˆ‡f = [2x, 2y]
grad_h = np.array([1, 2])  # โˆ‡h = [1, 2]
ax.quiver(x_opt[0], x_opt[1], grad_f[0], grad_f[1],
          angles='xy', scale_units='xy', scale=5, color='blue', width=0.006,
          label='$\\nabla f$ (๋ชฉ์ ํ•จ์ˆ˜ ๊ทธ๋ž˜๋””์–ธํŠธ)')
ax.quiver(x_opt[0], x_opt[1], grad_h[0], grad_h[1],
          angles='xy', scale_units='xy', scale=5, color='green', width=0.006,
          label='$\\nabla h$ (์ œ์•ฝ ๊ทธ๋ž˜๋””์–ธํŠธ)')

ax.set_xlim(-0.5, 1.5)
ax.set_ylim(-0.5, 1.5)
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('๋ผ๊ทธ๋ž‘์ฃผ ์Šน์ˆ˜๋ฒ•: $\\nabla f$์™€ $\\nabla h$๊ฐ€ ํ‰ํ–‰', fontsize=14)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')
plt.tight_layout()
plt.savefig('lagrange_multipliers.png', dpi=150)
plt.show()

print(f"\n์ตœ์ ์ ์—์„œ โˆ‡f = {grad_f}, โˆ‡h = {grad_h}")
print(f"โˆ‡f = ฮปโˆ‡h ํ™•์ธ: ฮป = {grad_f[0] / grad_h[0]:.3f} (x ์„ฑ๋ถ„)")

5. KKT Conditions (Karush-Kuhn-Tucker Conditions)

5.1 Inequality-Constrained Optimization

Consider the general optimization problem:

$$ \begin{align} \min_{x} \quad & f(x) \\ \text{subject to} \quad & h_i(x) = 0, \quad i = 1, \ldots, m \\ & g_j(x) \leq 0, \quad j = 1, \ldots, p \end{align} $$

Generalized Lagrangian:

$$ \mathcal{L}(x, \lambda, \mu) = f(x) + \sum_{i=1}^m \lambda_i h_i(x) + \sum_{j=1}^p \mu_j g_j(x) $$

5.2 KKT Conditions

If $x^*$ is optimal and certain regularity conditions are satisfied, there exist $\lambda^*, \mu^*$ satisfying the KKT conditions:

  1. Stationarity: $$\nabla_x \mathcal{L}(x^*, \lambda^*, \mu^*) = 0$$

  2. Primal Feasibility: $$h_i(x^*) = 0, \quad g_j(x^*) \leq 0$$

  3. Dual Feasibility: $$\mu_j^* \geq 0$$

  4. Complementary Slackness: $$\mu_j^* g_j(x^*) = 0, \quad \forall j$$

Meaning of complementary slackness: For each inequality constraint - $g_j(x^*) < 0$ (constraint inactive) โ†’ $\mu_j^* = 0$ - $\mu_j^* > 0$ (Lagrange multiplier positive) โ†’ $g_j(x^*) = 0$ (constraint active)

5.3 Example: Applying KKT Conditions

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize

# ๋ชฉ์  ํ•จ์ˆ˜: f(x,y) = (x-2)^2 + (y-1)^2
def objective(X):
    x, y = X
    return (x - 2)**2 + (y - 1)**2

def objective_grad(X):
    x, y = X
    return np.array([2*(x - 2), 2*(y - 1)])

# ๋ถ€๋“ฑ์‹ ์ œ์•ฝ: g1(x,y) = x + y - 1 <= 0
def constraint_ineq1(X):
    return X[0] + X[1] - 1

# ๋ถ€๋“ฑ์‹ ์ œ์•ฝ: g2(x,y) = -x <= 0 (์ฆ‰, x >= 0)
def constraint_ineq2(X):
    return -X[0]

# ๋ถ€๋“ฑ์‹ ์ œ์•ฝ: g3(x,y) = -y <= 0 (์ฆ‰, y >= 0)
def constraint_ineq3(X):
    return -X[1]

# scipy๋กœ ์ตœ์ ํ™”
constraints = [
    {'type': 'ineq', 'fun': lambda X: -constraint_ineq1(X)},  # scipy๋Š” g(x) >= 0 ํ˜•์‹
    {'type': 'ineq', 'fun': lambda X: -constraint_ineq2(X)},
    {'type': 'ineq', 'fun': lambda X: -constraint_ineq3(X)}
]

x0 = np.array([0.5, 0.5])
result = minimize(objective, x0, method='SLSQP', constraints=constraints,
                  options={'disp': True})

x_opt = result.x
print("KKT ์กฐ๊ฑด ์˜ˆ์ œ")
print(f"์ตœ์  ํ•ด: x* = {x_opt}")
print(f"๋ชฉ์  ํ•จ์ˆ˜ ๊ฐ’: f(x*) = {objective(x_opt):.6f}")

# ์ œ์•ฝ ์กฐ๊ฑด ํ™•์ธ
g1 = constraint_ineq1(x_opt)
g2 = constraint_ineq2(x_opt)
g3 = constraint_ineq3(x_opt)
print(f"\n์ œ์•ฝ ์กฐ๊ฑด ๊ฐ’:")
print(f"  g1(x*) = x + y - 1 = {g1:.6f} {'(ํ™œ์„ฑ)' if abs(g1) < 1e-6 else '(๋น„ํ™œ์„ฑ)'}")
print(f"  g2(x*) = -x = {g2:.6f} {'(ํ™œ์„ฑ)' if abs(g2) < 1e-6 else '(๋น„ํ™œ์„ฑ)'}")
print(f"  g3(x*) = -y = {g3:.6f} {'(ํ™œ์„ฑ)' if abs(g3) < 1e-6 else '(๋น„ํ™œ์„ฑ)'}")

# ์‹œ๊ฐํ™”
fig, ax = plt.subplots(figsize=(10, 8))

# ๋ชฉ์  ํ•จ์ˆ˜ ๋“ฑ๊ณ ์„ 
x = np.linspace(-0.5, 2.5, 300)
y = np.linspace(-0.5, 2.5, 300)
X, Y = np.meshgrid(x, y)
Z = (X - 2)**2 + (Y - 1)**2
contour = ax.contour(X, Y, Z, levels=15, alpha=0.6, cmap='viridis')
ax.clabel(contour, inline=True, fontsize=8)

# ์‹คํ˜„ ๊ฐ€๋Šฅ ์˜์—ญ (์‚ผ๊ฐํ˜•)
feasible_x = [0, 1, 0, 0]
feasible_y = [0, 0, 1, 0]
ax.fill(feasible_x, feasible_y, alpha=0.2, color='green', label='์‹คํ˜„ ๊ฐ€๋Šฅ ์˜์—ญ')

# ์ œ์•ฝ ์กฐ๊ฑด ๊ฒฝ๊ณ„
ax.plot([0, 1], [1, 0], 'r-', linewidth=2, label='$x + y = 1$')
ax.axvline(x=0, color='blue', linewidth=2, linestyle='--', label='$x = 0$')
ax.axhline(y=0, color='purple', linewidth=2, linestyle='--', label='$y = 0$')

# ์ œ์•ฝ ์—†๋Š” ์ตœ์†Ÿ๊ฐ’
ax.plot(2, 1, 'go', markersize=12, label='์ œ์•ฝ ์—†๋Š” ์ตœ์†Ÿ๊ฐ’ (2, 1)')

# KKT ์ตœ์ ์ 
ax.plot(x_opt[0], x_opt[1], 'r*', markersize=20,
        label=f'KKT ์ตœ์ ์  ({x_opt[0]:.2f}, {x_opt[1]:.2f})')

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(-0.5, 2.0)
ax.set_xlabel('x', fontsize=12)
ax.set_ylabel('y', fontsize=12)
ax.set_title('KKT ์กฐ๊ฑด: ๋ถ€๋“ฑ์‹ ์ œ์•ฝ ์ตœ์ ํ™”', fontsize=14)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
ax.set_aspect('equal')
plt.tight_layout()
plt.savefig('kkt_conditions.png', dpi=150)
plt.show()

# KKT ์กฐ๊ฑด ์ˆ˜๋™ ๊ฒ€์ฆ (ํ™œ์„ฑ ์ œ์•ฝ์ด g1์ธ ๊ฒฝ์šฐ)
print("\n\nKKT ์กฐ๊ฑด ์ˆ˜๋™ ๊ฒ€์ฆ:")
print("ํ™œ์„ฑ ์ œ์•ฝ: g1(x,y) = x + y - 1 = 0")
print("โˆ‡f(x*) + ฮผ1โˆ‡g1(x*) = 0")
print(f"โˆ‡f(x*) = {objective_grad(x_opt)}")
print("โˆ‡g1 = [1, 1]")
mu1 = -objective_grad(x_opt)[0]  # ๋˜๋Š” -objective_grad(x_opt)[1]
print(f"โ†’ ฮผ1 = {mu1:.4f}")
print(f"์ƒ๋ณด ์ด์™„: ฮผ1 * g1(x*) = {mu1 * g1:.6e} โ‰ˆ 0 โœ“")

6. Optimization Applications in Machine Learning

6.1 SVM Dual Problem

Primal Problem:

$$ \begin{align} \min_{w, b, \xi} \quad & \frac{1}{2}\|w\|^2 + C\sum_{i=1}^n \xi_i \\ \text{subject to} \quad & y_i(w^T x_i + b) \geq 1 - \xi_i \\ & \xi_i \geq 0 \end{align} $$

Constructing the Lagrangian and applying KKT conditions yields the Dual Problem:

$$ \begin{align} \max_{\alpha} \quad & \sum_{i=1}^n \alpha_i - \frac{1}{2}\sum_{i,j}\alpha_i \alpha_j y_i y_j x_i^T x_j \\ \text{subject to} \quad & 0 \leq \alpha_i \leq C \\ & \sum_{i=1}^n \alpha_i y_i = 0 \end{align} $$

Advantages of dual problem: - Kernel trick applicable - Dimensionality depends on number of data points (favorable for high-dimensional data)

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

# ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
X, y = datasets.make_classification(n_samples=100, n_features=2, n_informative=2,
                                     n_redundant=0, n_clusters_per_class=1,
                                     random_state=42)
y = 2*y - 1  # {0,1} โ†’ {-1,1}

# ์ •๊ทœํ™”
scaler = StandardScaler()
X = scaler.fit_transform(X)

# SVM ํ•™์Šต
svm = SVC(kernel='linear', C=1.0)
svm.fit(X, y)

# ๊ฒฐ์ • ๊ฒฝ๊ณ„ ์‹œ๊ฐํ™”
fig, ax = plt.subplots(figsize=(10, 8))

# ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ
ax.scatter(X[y==1, 0], X[y==1, 1], c='blue', marker='o', s=100,
           edgecolors='k', label='Class +1')
ax.scatter(X[y==-1, 0], X[y==-1, 1], c='red', marker='s', s=100,
           edgecolors='k', label='Class -1')

# ๊ฒฐ์ • ๊ฒฝ๊ณ„
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 300)
yy = np.linspace(ylim[0], ylim[1], 300)
XX, YY = np.meshgrid(xx, yy)
Z = svm.decision_function(np.c_[XX.ravel(), YY.ravel()])
Z = Z.reshape(XX.shape)

# ๊ฒฐ์ • ๊ฒฝ๊ณ„์™€ ๋งˆ์ง„
ax.contour(XX, YY, Z, levels=[-1, 0, 1], colors=['red', 'black', 'blue'],
           linestyles=['--', '-', '--'], linewidths=[2, 3, 2])

# ์„œํฌํŠธ ๋ฒกํ„ฐ ๊ฐ•์กฐ
support_vectors = svm.support_vectors_
ax.scatter(support_vectors[:, 0], support_vectors[:, 1], s=300,
           linewidths=2, facecolors='none', edgecolors='green',
           label='Support Vectors')

ax.set_xlabel('Feature 1', fontsize=12)
ax.set_ylabel('Feature 2', fontsize=12)
ax.set_title('SVM: KKT ์กฐ๊ฑด์œผ๋กœ ์œ ๋„๋œ ์Œ๋Œ€ ๋ฌธ์ œ', fontsize=14)
ax.legend(fontsize=10)
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('svm_kkt_dual.png', dpi=150)
plt.show()

print(f"์„œํฌํŠธ ๋ฒกํ„ฐ ๊ฐœ์ˆ˜: {len(support_vectors)}")
print(f"์Œ๋Œ€ ๊ณ„์ˆ˜ (ฮฑ): {svm.dual_coef_}")
print("\nKKT ์ƒ๋ณด ์ด์™„:")
print("  ฮฑ = 0: ๋งˆ์ง„ ๋ฐ”๊นฅ (๋น„ํ™œ์„ฑ ์ œ์•ฝ)")
print("  0 < ฮฑ < C: ๋งˆ์ง„ ์œ„ (์„œํฌํŠธ ๋ฒกํ„ฐ)")
print("  ฮฑ = C: ๋งˆ์ง„ ์•ˆ์ชฝ (์˜ค๋ถ„๋ฅ˜ ๊ฐ€๋Šฅ)")

6.2 Regularization = Constrained Optimization

Two Formulations of Ridge Regression:

Regularization form: $$\min_w \|Xw - y\|^2 + \lambda\|w\|^2$$

Constrained form: $$ \begin{align} \min_w \quad & \|Xw - y\|^2 \\ \text{subject to} \quad & \|w\|^2 \leq t \end{align} $$

These two problems are equivalent with appropriate correspondence between $\lambda$ and $t$ (Lagrangian duality).

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt

# ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
np.random.seed(42)
n_samples, n_features = 100, 50
X = np.random.randn(n_samples, n_features)
true_w = np.random.randn(n_features)
true_w[20:] = 0  # ํฌ์†Œ์„ฑ
y = X @ true_w + 0.1 * np.random.randn(n_samples)

# Ridge ํšŒ๊ท€ (์—ฌ๋Ÿฌ ฮป ๊ฐ’)
lambdas = np.logspace(-3, 3, 100)
coefs = []

for lam in lambdas:
    ridge = Ridge(alpha=lam, fit_intercept=False)
    ridge.fit(X, y)
    coefs.append(ridge.coef_)

coefs = np.array(coefs)

# ์‹œ๊ฐํ™”: ์ •๊ทœํ™” ๊ฒฝ๋กœ (Regularization Path)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# ์™ผ์ชฝ: ๊ณ„์ˆ˜ ๊ฒฝ๋กœ
for i in range(min(10, n_features)):
    axes[0].plot(np.log10(lambdas), coefs[:, i], linewidth=2)
axes[0].set_xlabel('$\log_{10}(\lambda)$', fontsize=12)
axes[0].set_ylabel('๊ณ„์ˆ˜ ๊ฐ’', fontsize=12)
axes[0].set_title('Ridge ์ •๊ทœํ™” ๊ฒฝ๋กœ', fontsize=14)
axes[0].grid(True, alpha=0.3)
axes[0].axvline(x=0, color='r', linestyle='--', label='$\lambda=1$')
axes[0].legend()

# ์˜ค๋ฅธ์ชฝ: L2 ๋…ธ๋ฆ„
l2_norms = np.linalg.norm(coefs, axis=1)
axes[1].plot(np.log10(lambdas), l2_norms, linewidth=3, color='purple')
axes[1].set_xlabel('$\log_{10}(\lambda)$', fontsize=12)
axes[1].set_ylabel('$\|w\|_2$', fontsize=12)
axes[1].set_title('๊ณ„์ˆ˜์˜ L2 ๋…ธ๋ฆ„ vs ์ •๊ทœํ™” ๊ฐ•๋„', fontsize=14)
axes[1].grid(True, alpha=0.3)
axes[1].axvline(x=0, color='r', linestyle='--', label='$\lambda=1$')
axes[1].legend()

plt.tight_layout()
plt.savefig('ridge_regularization_path.png', dpi=150)
plt.show()

print("์ •๊ทœํ™”์™€ ์ œ์•ฝ ์ตœ์ ํ™”์˜ ๊ด€๊ณ„:")
print(f"  ฮป โ†‘ โ†’ ์ œ์•ฝ t โ†“ โ†’ ||w|| โ†“")
print(f"  ฮป=0: ์ œ์•ฝ ์—†์Œ (OLS)")
print(f"  ฮปโ†’โˆž: w โ†’ 0")

6.3 Convex vs Non-convex Optimization

Convex Optimization: - Linear/logistic regression, SVM, Lasso - Global minimum guaranteed - Efficient algorithms exist

Non-convex Optimization: - Neural networks (MLP, CNN, Transformer) - Local minima and saddle points exist - Heuristics: SGD + momentum, adaptive learning rates - Over-parameterization โ†’ many good local minima

import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# ๊ฐ„๋‹จํ•œ ์‹ ๊ฒฝ๋ง: ๋น„๋ณผ๋ก ์†์‹ค ์ง€ํ˜•
class SimpleNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.w1 = nn.Parameter(torch.tensor([1.0]))
        self.w2 = nn.Parameter(torch.tensor([1.0]))

    def forward(self, x):
        return self.w1 * x + self.w2 * x**2

# ์†์‹ค ํ•จ์ˆ˜ (๋น„๋ณผ๋ก ์ง€ํ˜• ์‹œ๋ฎฌ๋ ˆ์ด์…˜)
def loss_landscape(w1, w2):
    """์ธ์œ„์ ์ธ ๋น„๋ณผ๋ก ์†์‹ค ์ง€ํ˜•"""
    return (w1**2 + w2**2) * (1 + 0.3*np.sin(5*w1) * np.sin(5*w2))

# ์‹œ๊ฐํ™”
w1_vals = np.linspace(-2, 2, 200)
w2_vals = np.linspace(-2, 2, 200)
W1, W2 = np.meshgrid(w1_vals, w2_vals)
Loss = loss_landscape(W1, W2)

fig = plt.figure(figsize=(16, 6))

# 3D ํ‘œ๋ฉด
ax1 = fig.add_subplot(121, projection='3d')
ax1.plot_surface(W1, W2, Loss, cmap='viridis', alpha=0.8, edgecolor='none')
ax1.set_xlabel('$w_1$', fontsize=12)
ax1.set_ylabel('$w_2$', fontsize=12)
ax1.set_zlabel('Loss', fontsize=12)
ax1.set_title('์‹ ๊ฒฝ๋ง ์†์‹ค ์ง€ํ˜• (๋น„๋ณผ๋ก)', fontsize=14)

# ๋“ฑ๊ณ ์„ 
ax2 = fig.add_subplot(122)
contour = ax2.contourf(W1, W2, Loss, levels=30, cmap='viridis')
plt.colorbar(contour, ax=ax2)
ax2.set_xlabel('$w_1$', fontsize=12)
ax2.set_ylabel('$w_2$', fontsize=12)
ax2.set_title('์†์‹ค ๋“ฑ๊ณ ์„ : ์—ฌ๋Ÿฌ ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’', fontsize=14)

# ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’ ์œ„์น˜ (๊ทผ์‚ฌ)
local_minima = [(-1.2, -1.2), (0, 0), (1.2, 1.2)]
for x, y in local_minima:
    ax2.plot(x, y, 'r*', markersize=15)

plt.tight_layout()
plt.savefig('nonconvex_loss_landscape.png', dpi=150)
plt.show()

print("์‹ ๊ฒฝ๋ง ์ตœ์ ํ™”์˜ ํŠน์ง•:")
print("  - ๋น„๋ณผ๋ก ์†์‹ค ์ง€ํ˜•")
print("  - ์—ฌ๋Ÿฌ ์ง€์—ญ ์ตœ์†Ÿ๊ฐ’๊ณผ ์•ˆ์žฅ์ ")
print("  - SGD + ๋ชจ๋ฉ˜ํ…€์œผ๋กœ ์•ˆ์žฅ์  ํƒˆ์ถœ")
print("  - ์˜ค๋ฒ„ํŒŒ๋ผ๋ฏธํ„ฐํ™”: ์ข‹์€ ์ตœ์†Ÿ๊ฐ’์ด ๋งŽ์Œ")

7. Implementing Optimization Algorithms

7.1 Using scipy.optimize

from scipy.optimize import minimize, LinearConstraint, NonlinearConstraint
import numpy as np

# ์˜ˆ์ œ: Portfolio Optimization
# ๋ชฉ์ : ์œ„ํ—˜ ์ตœ์†Œํ™” (๋ถ„์‚ฐ)
# ์ œ์•ฝ: ๊ธฐ๋Œ€ ์ˆ˜์ต๋ฅ  ๋‹ฌ์„ฑ, ํˆฌ์ž ๋น„์œจ ํ•ฉ = 1

np.random.seed(42)
n_assets = 5
returns = np.random.randn(n_assets) * 0.1 + 0.05  # ๊ธฐ๋Œ€ ์ˆ˜์ต๋ฅ 
cov_matrix = np.random.randn(n_assets, n_assets)
cov_matrix = cov_matrix @ cov_matrix.T  # ์–‘์ •์น˜ ๊ณต๋ถ„์‚ฐ

def portfolio_variance(weights):
    """ํฌํŠธํด๋ฆฌ์˜ค ๋ถ„์‚ฐ (๋ชฉ์  ํ•จ์ˆ˜)"""
    return weights @ cov_matrix @ weights

def portfolio_return(weights):
    """ํฌํŠธํด๋ฆฌ์˜ค ๊ธฐ๋Œ€ ์ˆ˜์ต๋ฅ """
    return weights @ returns

# ์ œ์•ฝ ์กฐ๊ฑด
target_return = 0.06
constraints = [
    {'type': 'eq', 'fun': lambda w: np.sum(w) - 1},  # ํ•ฉ = 1
    {'type': 'eq', 'fun': lambda w: portfolio_return(w) - target_return}  # ์ˆ˜์ต๋ฅ 
]

# ๊ฒฝ๊ณ„: 0 <= w_i <= 1 (๊ณต๋งค๋„ ๊ธˆ์ง€)
bounds = [(0, 1) for _ in range(n_assets)]

# ์ดˆ๊ธฐ๊ฐ’
w0 = np.ones(n_assets) / n_assets

# ์ตœ์ ํ™”
result = minimize(portfolio_variance, w0, method='SLSQP',
                  bounds=bounds, constraints=constraints)

optimal_weights = result.x

print("Portfolio Optimization (KKT ์กฐ๊ฑด)")
print(f"์ตœ์  ํˆฌ์ž ๋น„์œจ: {optimal_weights}")
print(f"ํฌํŠธํด๋ฆฌ์˜ค ๋ถ„์‚ฐ: {portfolio_variance(optimal_weights):.6f}")
print(f"ํฌํŠธํด๋ฆฌ์˜ค ์ˆ˜์ต๋ฅ : {portfolio_return(optimal_weights):.6f}")
print(f"๋ชฉํ‘œ ์ˆ˜์ต๋ฅ : {target_return}")

# ์‹œ๊ฐํ™”
fig, ax = plt.subplots(figsize=(10, 6))
assets = [f'Asset {i+1}' for i in range(n_assets)]
bars = ax.bar(assets, optimal_weights, color='skyblue', edgecolor='black', linewidth=1.5)

# ๊ฐ’ ํ‘œ์‹œ
for bar, weight in zip(bars, optimal_weights):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{weight:.3f}', ha='center', va='bottom', fontsize=11)

ax.set_ylabel('ํˆฌ์ž ๋น„์œจ', fontsize=12)
ax.set_title(f'์ตœ์  ํฌํŠธํด๋ฆฌ์˜ค (๋ชฉํ‘œ ์ˆ˜์ต๋ฅ  {target_return:.1%})', fontsize=14)
ax.set_ylim(0, max(optimal_weights) * 1.2)
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.savefig('portfolio_optimization.png', dpi=150)
plt.show()

Practice Problems

  1. Convexity Determination: Determine whether the following functions are convex, and prove using the Hessian.
  2. (a) $f(x, y) = e^x + e^y$
  3. (b) $f(x, y) = x^2 - xy + y^2$
  4. (c) $f(x, y) = \log(e^x + e^y)$

  5. Lagrange Multipliers: Solve the following optimization problem using Lagrange multipliers. $$ \begin{align} \min_{x, y} \quad & x^2 + 2y^2 \\ \text{subject to} \quad & x + y = 1 \end{align} $$ Find the analytical solution and verify with Python.

  6. KKT Conditions Application: Write the KKT conditions for the following problem and find the optimal solution. $$ \begin{align} \min_{x, y} \quad & (x-3)^2 + (y-2)^2 \\ \text{subject to} \quad & x + 2y \leq 4 \\ & x \geq 0, \; y \geq 0 \end{align} $$ Analyze which constraints are active.

  7. SVM Dual Problem: Generate a linearly separable 2D dataset and solve the SVM dual problem directly. Use the cvxopt library to solve the quadratic programming (QP) problem and identify support vectors.

  8. Non-convex Optimization: Run gradient descent on the Rosenbrock function from multiple initial points and compare the local minima they converge to. Analyze how convergence speed changes when momentum is added.

References

  • Boyd, S., & Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
  • The bible of convex optimization, freely available online
  • Nocedal, J., & Wright, S. (2006). Numerical Optimization. Springer.
  • Theory and implementation of numerical optimization algorithms
  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
  • Chapter 4: Numerical Computation (optimization fundamentals)
  • Bottou, L., Curtis, F. E., & Nocedal, J. (2018). "Optimization Methods for Large-Scale Machine Learning". SIAM Review.
  • Modern perspective on ML optimization
  • SciPy Optimize Documentation: https://docs.scipy.org/doc/scipy/reference/optimize.html
  • CVX Forum: https://ask.cvxr.com/
  • Convex optimization Q&A
to navigate between lessons