15. Graph Theory and Spectral Methods

15. Graph Theory and Spectral Methods

Learning Objectives

  • Understand and implement mathematical representations of graphs (adjacency matrix, degree matrix, Laplacian)
  • Explain eigenvalue decomposition and spectral properties of graph Laplacians
  • Understand and implement the mathematical principles of spectral clustering algorithms
  • Understand the mathematical foundations of random walks and the PageRank algorithm
  • Understand the concepts of graph signal processing and graph Fourier transform
  • Understand the mathematical foundations of GNNs (Graph Neural Networks) and message passing mechanisms

1. Mathematical Representation of Graphs

1.1 Graph Basics

A graph $G = (V, E)$ consists of a vertex set $V$ and an edge set $E \subseteq V \times V$.

Graph Types: - Undirected graph: $(i,j) \in E \Rightarrow (j,i) \in E$ - Directed graph: edges have direction - Weighted graph: each edge is assigned a weight $w_{ij}$

1.2 Adjacency Matrix

For a graph with $n$ vertices, the adjacency matrix $A \in \mathbb{R}^{n \times n}$:

$$A_{ij} = \begin{cases} w_{ij} & \text{if } (i,j) \in E \\ 0 & \text{otherwise} \end{cases}$$

Properties: - Undirected graph: $A = A^T$ (symmetric) - Binary graph: $A_{ij} \in \{0, 1\}$ - $(i,j)$ element of $A^k$: number of paths of length $k$ from $i$ to $j$

import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import eigh

# ๊ฐ„๋‹จํ•œ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ
def create_sample_graph():
    """
    5๊ฐœ ์ •์ ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฌด๋ฐฉํ–ฅ ๊ทธ๋ž˜ํ”„
    """
    n = 5
    A = np.array([
        [0, 1, 1, 0, 0],
        [1, 0, 1, 1, 0],
        [1, 1, 0, 1, 1],
        [0, 1, 1, 0, 1],
        [0, 0, 1, 1, 0]
    ])
    return A

A = create_sample_graph()
print("์ธ์ ‘ ํ–‰๋ ฌ A:")
print(A)
print(f"\n๋Œ€์นญ์„ฑ ํ™•์ธ: {np.allclose(A, A.T)}")

# ๊ฒฝ๋กœ ์ˆ˜ ๊ณ„์‚ฐ
A2 = np.linalg.matrix_power(A, 2)
print(f"\n์ •์  0์—์„œ ์ •์  4๋กœ ๊ฐ€๋Š” ๊ธธ์ด 2์ธ ๊ฒฝ๋กœ์˜ ์ˆ˜: {A2[0, 4]}")

1.3 Degree Matrix

The degree of vertex $i$, $d_i = \sum_{j} A_{ij}$, is the number of connected edges.

The degree matrix $D \in \mathbb{R}^{n \times n}$ is a diagonal matrix:

$$D = \text{diag}(d_1, d_2, \ldots, d_n)$$

def compute_degree_matrix(A):
    """์ฐจ์ˆ˜ ํ–‰๋ ฌ ๊ณ„์‚ฐ"""
    degrees = np.sum(A, axis=1)
    D = np.diag(degrees)
    return D, degrees

D, degrees = compute_degree_matrix(A)
print("์ฐจ์ˆ˜ ๋ฒกํ„ฐ:", degrees)
print("\n์ฐจ์ˆ˜ ํ–‰๋ ฌ D:")
print(D)

2. Graph Laplacian

2.1 Definition of Laplacian

Unnormalized Laplacian:

$$L = D - A$$

Properties: - Symmetric: $L = L^T$ - Positive semidefinite: $\mathbf{x}^T L \mathbf{x} \geq 0$ - $L \mathbf{1} = \mathbf{0}$ (vector of all ones is an eigenvector with eigenvalue 0)

2.2 Quadratic Form of Laplacian

$$\mathbf{x}^T L \mathbf{x} = \mathbf{x}^T(D - A)\mathbf{x} = \sum_{i} d_i x_i^2 - \sum_{i,j} A_{ij} x_i x_j$$

For undirected graphs:

$$\mathbf{x}^T L \mathbf{x} = \frac{1}{2} \sum_{i,j} A_{ij}(x_i - x_j)^2$$

This is a smoothness measure that quantifies differences between adjacent vertices.

def compute_laplacian(A):
    """๊ทธ๋ž˜ํ”„ ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ„์‚ฐ"""
    D, _ = compute_degree_matrix(A)
    L = D - A
    return L

L = compute_laplacian(A)
print("๋ผํ”Œ๋ผ์‹œ์•ˆ ํ–‰๋ ฌ L:")
print(L)

# ์–‘๋ฐ˜์ •์น˜ ํ™•์ธ (๋ชจ๋“  ๊ณ ์œ ๊ฐ’ >= 0)
eigenvalues = np.linalg.eigvalsh(L)
print(f"\n๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๊ฐ’: {eigenvalues}")
print(f"์ตœ์†Œ ๊ณ ์œ ๊ฐ’: {eigenvalues[0]:.10f}")

2.3 Normalized Laplacian

Symmetric normalized Laplacian:

$$L_{\text{sym}} = D^{-1/2} L D^{-1/2} = I - D^{-1/2} A D^{-1/2}$$

Random walk normalized Laplacian:

$$L_{\text{rw}} = D^{-1} L = I - D^{-1} A$$

Eigenvalues of the normalized Laplacian lie in the range $[0, 2]$.

def compute_normalized_laplacian(A):
    """์ •๊ทœํ™” ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ„์‚ฐ"""
    D, degrees = compute_degree_matrix(A)

    # D^{-1/2} ๊ณ„์‚ฐ
    D_inv_sqrt = np.diag(1.0 / np.sqrt(degrees))

    # L_sym = D^{-1/2} L D^{-1/2}
    L = compute_laplacian(A)
    L_sym = D_inv_sqrt @ L @ D_inv_sqrt

    # L_rw = D^{-1} L
    D_inv = np.diag(1.0 / degrees)
    L_rw = D_inv @ L

    return L_sym, L_rw

L_sym, L_rw = compute_normalized_laplacian(A)
print("์ •๊ทœํ™” ๋ผํ”Œ๋ผ์‹œ์•ˆ L_sym:")
print(L_sym)

eig_sym = np.linalg.eigvalsh(L_sym)
print(f"\nL_sym ๊ณ ์œ ๊ฐ’: {eig_sym}")

2.4 Connected Components and Eigenvalues

Theorem: If a graph has $k$ connected components, the multiplicity of eigenvalue 0 of the Laplacian is $k$.

def create_disconnected_graph():
    """๋‘ ๊ฐœ์˜ ์—ฐ๊ฒฐ ์„ฑ๋ถ„์„ ๊ฐ€์ง„ ๊ทธ๋ž˜ํ”„"""
    # ์„ฑ๋ถ„ 1: ์ •์  0, 1, 2
    # ์„ฑ๋ถ„ 2: ์ •์  3, 4
    A = np.array([
        [0, 1, 1, 0, 0],
        [1, 0, 1, 0, 0],
        [1, 1, 0, 0, 0],
        [0, 0, 0, 0, 1],
        [0, 0, 0, 1, 0]
    ])
    return A

A_disconnected = create_disconnected_graph()
L_disconnected = compute_laplacian(A_disconnected)
eigenvalues_disc = np.linalg.eigvalsh(L_disconnected)

print("๋น„์—ฐ๊ฒฐ ๊ทธ๋ž˜ํ”„์˜ ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๊ฐ’:")
print(eigenvalues_disc)
print(f"๊ณ ์œ ๊ฐ’ 0์˜ ๊ฐœ์ˆ˜ (์—ฐ๊ฒฐ ์„ฑ๋ถ„ ์ˆ˜): {np.sum(np.abs(eigenvalues_disc) < 1e-10)}")

3. Spectral Clustering

3.1 Graph Cut Problem

When partitioning a graph into two parts $S$ and $\bar{S}$, the cut cost is:

$$\text{cut}(S, \bar{S}) = \sum_{i \in S, j \in \bar{S}} A_{ij}$$

Normalized Cut:

$$\text{Ncut}(S, \bar{S}) = \frac{\text{cut}(S, \bar{S})}{\text{vol}(S)} + \frac{\text{cut}(S, \bar{S})}{\text{vol}(\bar{S})}$$

where $\text{vol}(S) = \sum_{i \in S} d_i$ is the volume of subset $S$.

3.2 Fiedler Vector and Spectral Methods

The Ncut problem is NP-hard, but can be approximated using a relaxation with the second smallest eigenvector of the Laplacian (Fiedler vector).

Rayleigh quotient:

$$\min_{\mathbf{y}} \frac{\mathbf{y}^T L \mathbf{y}}{\mathbf{y}^T D \mathbf{y}} \quad \text{s.t. } \mathbf{y}^T D \mathbf{1} = 0$$

The solution is the second smallest eigenvector of the generalized eigenvalue problem $L \mathbf{y} = \lambda D \mathbf{y}$.

def spectral_clustering(A, n_clusters=2):
    """
    ์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Parameters:
    -----------
    A : ndarray
        ์ธ์ ‘ ํ–‰๋ ฌ
    n_clusters : int
        ๊ตฐ์ง‘ ์ˆ˜

    Returns:
    --------
    labels : ndarray
        ๊ฐ ์ •์ ์˜ ๊ตฐ์ง‘ ๋ ˆ์ด๋ธ”
    """
    # ์ •๊ทœํ™” ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ„์‚ฐ
    L_sym, _ = compute_normalized_laplacian(A)

    # ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด (์ตœ์†Œ n_clusters๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ)
    eigenvalues, eigenvectors = eigh(L_sym)

    # ์ตœ์†Œ n_clusters๊ฐœ์˜ ๊ณ ์œ ๋ฒกํ„ฐ ์„ ํƒ
    U = eigenvectors[:, :n_clusters]

    # ํ–‰ ์ •๊ทœํ™”
    U_normalized = U / np.linalg.norm(U, axis=1, keepdims=True)

    # k-means ๊ตฐ์ง‘ํ™”
    from sklearn.cluster import KMeans
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(U_normalized)

    return labels, U

# ์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™” ์ ์šฉ
labels, U = spectral_clustering(A, n_clusters=2)
print("๊ตฐ์ง‘ ๋ ˆ์ด๋ธ”:", labels)
print("\n์ฒซ 2๊ฐœ ๊ณ ์œ ๋ฒกํ„ฐ:")
print(U)

3.3 Intuition Behind Spectral Clustering

  • Sign of Fiedler vector: a good indicator for partitioning the graph into two parts
  • Magnitude of eigenvalues: indicates separation between clusters
  • eigengap: if the gap between $\lambda_k$ and $\lambda_{k+1}$ is large, $k$ clusters is appropriate
def visualize_spectral_clustering():
    """์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™” ์‹œ๊ฐํ™”"""
    # ๋” ํฐ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ (๋‘ ๊ฐœ์˜ ๋ฐ€์ง‘๋œ ์ปค๋ฎค๋‹ˆํ‹ฐ)
    np.random.seed(42)
    n1, n2 = 15, 15
    n = n1 + n2

    # ๋ธ”๋ก ํ–‰๋ ฌ ๊ตฌ์กฐ
    A_block = np.zeros((n, n))

    # ์ปค๋ฎค๋‹ˆํ‹ฐ 1 ๋‚ด๋ถ€ ์—ฐ๊ฒฐ (๋ฐ€์ง‘)
    for i in range(n1):
        for j in range(i+1, n1):
            if np.random.rand() < 0.6:
                A_block[i, j] = A_block[j, i] = 1

    # ์ปค๋ฎค๋‹ˆํ‹ฐ 2 ๋‚ด๋ถ€ ์—ฐ๊ฒฐ (๋ฐ€์ง‘)
    for i in range(n1, n):
        for j in range(i+1, n):
            if np.random.rand() < 0.6:
                A_block[i, j] = A_block[j, i] = 1

    # ์ปค๋ฎค๋‹ˆํ‹ฐ ๊ฐ„ ์—ฐ๊ฒฐ (ํฌ์†Œ)
    for i in range(n1):
        for j in range(n1, n):
            if np.random.rand() < 0.05:
                A_block[i, j] = A_block[j, i] = 1

    # ์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™”
    labels, U = spectral_clustering(A_block, n_clusters=2)

    # ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๊ฐ’
    L_sym, _ = compute_normalized_laplacian(A_block)
    eigenvalues = np.linalg.eigvalsh(L_sym)

    # ์‹œ๊ฐํ™”
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))

    # ์ธ์ ‘ ํ–‰๋ ฌ
    axes[0].imshow(A_block, cmap='binary')
    axes[0].set_title('Adjacency Matrix')
    axes[0].set_xlabel('Node')
    axes[0].set_ylabel('Node')

    # ๊ณ ์œ ๊ฐ’ ์ŠคํŽ™ํŠธ๋Ÿผ
    axes[1].plot(eigenvalues, 'o-')
    axes[1].axvline(x=2, color='r', linestyle='--', label='Eigengap')
    axes[1].set_xlabel('Index')
    axes[1].set_ylabel('Eigenvalue')
    axes[1].set_title('Laplacian Spectrum')
    axes[1].legend()
    axes[1].grid(True)

    # Fiedler ๋ฒกํ„ฐ
    axes[2].scatter(range(n), U[:, 1], c=labels, cmap='viridis', s=50)
    axes[2].axhline(y=0, color='r', linestyle='--')
    axes[2].set_xlabel('Node')
    axes[2].set_ylabel('Fiedler Vector Value')
    axes[2].set_title('Fiedler Vector (2nd eigenvector)')
    axes[2].grid(True)

    plt.tight_layout()
    plt.savefig('spectral_clustering.png', dpi=150, bbox_inches='tight')
    print("์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™” ์‹œ๊ฐํ™” ์ €์žฅ ์™„๋ฃŒ")

visualize_spectral_clustering()

4. Random Walk on Graphs

4.1 Transition Probability Matrix

A random walk moves from the current vertex to an adjacent vertex uniformly.

Transition probability matrix:

$$P = D^{-1} A$$

$P_{ij}$ is the probability of moving from vertex $i$ to vertex $j$.

def compute_transition_matrix(A):
    """์ „์ด ํ™•๋ฅ  ํ–‰๋ ฌ ๊ณ„์‚ฐ"""
    D, degrees = compute_degree_matrix(A)
    D_inv = np.diag(1.0 / degrees)
    P = D_inv @ A
    return P

P = compute_transition_matrix(A)
print("์ „์ด ํ™•๋ฅ  ํ–‰๋ ฌ P:")
print(P)
print(f"\n๊ฐ ํ–‰์˜ ํ•ฉ (ํ™•๋ฅ ์˜ ํ•ฉ): {np.sum(P, axis=1)}")

4.2 Stationary Distribution

The stationary distribution $\pi$ satisfies:

$$\pi^T P = \pi^T$$

For a connected undirected graph, the stationary distribution is:

$$\pi_i = \frac{d_i}{\sum_j d_j}$$

def compute_stationary_distribution(A):
    """์ •์ƒ ๋ถ„ํฌ ๊ณ„์‚ฐ"""
    _, degrees = compute_degree_matrix(A)
    pi = degrees / np.sum(degrees)
    return pi

pi = compute_stationary_distribution(A)
print("์ •์ƒ ๋ถ„ํฌ ฯ€:")
print(pi)

# ๊ฒ€์ฆ: ฯ€^T P = ฯ€^T
P = compute_transition_matrix(A)
pi_next = pi @ P
print(f"\n์ •์ƒ์„ฑ ํ™•์ธ: {np.allclose(pi, pi_next)}")

4.3 PageRank Algorithm

PageRank adds teleportation to the random walk:

$$\mathbf{r} = (1 - d) \mathbf{e} + d P^T \mathbf{r}$$

where $d \in [0, 1]$ is the damping factor (typically 0.85), and $\mathbf{e}$ is the uniform distribution.

def pagerank(A, d=0.85, max_iter=100, tol=1e-6):
    """
    PageRank ์•Œ๊ณ ๋ฆฌ์ฆ˜

    Parameters:
    -----------
    A : ndarray
        ์ธ์ ‘ ํ–‰๋ ฌ
    d : float
        Damping factor
    max_iter : int
        ์ตœ๋Œ€ ๋ฐ˜๋ณต ํšŸ์ˆ˜
    tol : float
        ์ˆ˜๋ ด ์ž„๊ณ„๊ฐ’

    Returns:
    --------
    r : ndarray
        PageRank ์ ์ˆ˜
    """
    n = A.shape[0]
    P = compute_transition_matrix(A)

    # ์ดˆ๊ธฐํ™”: ๊ท ๋“ฑ ๋ถ„ํฌ
    r = np.ones(n) / n

    for iteration in range(max_iter):
        r_new = (1 - d) / n + d * (P.T @ r)

        # ์ˆ˜๋ ด ํ™•์ธ
        if np.linalg.norm(r_new - r, 1) < tol:
            print(f"์ˆ˜๋ ด ์™„๋ฃŒ: {iteration + 1}๋ฒˆ ๋ฐ˜๋ณต")
            break

        r = r_new

    return r

# ๋ฐฉํ–ฅ ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ (์›นํŽ˜์ด์ง€ ๋งํฌ ๊ตฌ์กฐ)
A_directed = np.array([
    [0, 1, 1, 0, 0],
    [0, 0, 1, 1, 0],
    [1, 0, 0, 1, 1],
    [0, 0, 0, 0, 1],
    [0, 0, 1, 0, 0]
])

pagerank_scores = pagerank(A_directed)
print("\nPageRank ์ ์ˆ˜:")
for i, score in enumerate(pagerank_scores):
    print(f"ํŽ˜์ด์ง€ {i}: {score:.4f}")

5. Graph Signal Processing

5.1 Graph Signals

A graph signal $\mathbf{f} \in \mathbb{R}^n$ is a value assigned to each vertex.

Examples: activity level of each user in a social network, measurement from each sensor in a sensor network

5.2 Graph Fourier Transform (GFT)

The eigenvectors of the Laplacian $\mathbf{u}_\ell$ are used as frequency bases of the graph.

$$L \mathbf{u}_\ell = \lambda_\ell \mathbf{u}_\ell$$

Graph Fourier Transform:

$$\hat{f}(\ell) = \langle \mathbf{f}, \mathbf{u}_\ell \rangle = \mathbf{u}_\ell^T \mathbf{f}$$

Inverse transform:

$$\mathbf{f} = \sum_{\ell=0}^{n-1} \hat{f}(\ell) \mathbf{u}_\ell$$

def graph_fourier_transform(A, signal):
    """
    ๊ทธ๋ž˜ํ”„ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜

    Parameters:
    -----------
    A : ndarray
        ์ธ์ ‘ ํ–‰๋ ฌ
    signal : ndarray
        ๊ทธ๋ž˜ํ”„ ์‹ ํ˜ธ

    Returns:
    --------
    f_hat : ndarray
        ์ฃผํŒŒ์ˆ˜ ์˜์—ญ ์‹ ํ˜ธ
    eigenvalues : ndarray
        ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๊ฐ’
    eigenvectors : ndarray
        ๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๋ฒกํ„ฐ
    """
    L_sym, _ = compute_normalized_laplacian(A)
    eigenvalues, eigenvectors = eigh(L_sym)

    # ๊ทธ๋ž˜ํ”„ ํ‘ธ๋ฆฌ์— ๋ณ€ํ™˜
    f_hat = eigenvectors.T @ signal

    return f_hat, eigenvalues, eigenvectors

# ์˜ˆ์ œ: ์ €์ฃผํŒŒ ์‹ ํ˜ธ ์ƒ์„ฑ
n = A.shape[0]
signal_smooth = np.array([1.0, 1.1, 0.9, 0.8, 1.0])

f_hat, eigenvalues, eigenvectors = graph_fourier_transform(A, signal_smooth)

print("์› ์‹ ํ˜ธ:", signal_smooth)
print("์ฃผํŒŒ์ˆ˜ ์˜์—ญ ์‹ ํ˜ธ:", f_hat)
print("\n๋ผํ”Œ๋ผ์‹œ์•ˆ ๊ณ ์œ ๊ฐ’ (์ฃผํŒŒ์ˆ˜):", eigenvalues)

5.3 Graph Filtering

Filtering signals in the frequency domain:

$$\mathbf{f}_{\text{filtered}} = \sum_{\ell=0}^{n-1} h(\lambda_\ell) \hat{f}(\ell) \mathbf{u}_\ell$$

where $h(\lambda)$ is the filter function.

Low-pass filter (smoothing): keep only small $\lambda$ components High-pass filter (edge detection): keep only large $\lambda$ components

def graph_filter(A, signal, filter_func):
    """๊ทธ๋ž˜ํ”„ ํ•„ํ„ฐ๋ง"""
    f_hat, eigenvalues, eigenvectors = graph_fourier_transform(A, signal)

    # ์ฃผํŒŒ์ˆ˜ ์˜์—ญ์—์„œ ํ•„ํ„ฐ ์ ์šฉ
    f_hat_filtered = f_hat * filter_func(eigenvalues)

    # ์—ญ๋ณ€ํ™˜
    signal_filtered = eigenvectors @ f_hat_filtered

    return signal_filtered

# ์ €์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ
def lowpass_filter(eigenvalues, cutoff=0.5):
    return (eigenvalues < cutoff).astype(float)

# ๊ณ ์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ
def highpass_filter(eigenvalues, cutoff=0.5):
    return (eigenvalues >= cutoff).astype(float)

# ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์‹ ํ˜ธ ์ƒ์„ฑ
np.random.seed(42)
signal_noisy = signal_smooth + 0.3 * np.random.randn(n)

signal_lowpass = graph_filter(A, signal_noisy, lambda eig: lowpass_filter(eig, cutoff=1.0))
signal_highpass = graph_filter(A, signal_noisy, lambda eig: highpass_filter(eig, cutoff=1.0))

print("์› ์‹ ํ˜ธ:", signal_smooth)
print("๋…ธ์ด์ฆˆ ์‹ ํ˜ธ:", signal_noisy)
print("์ €์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ ๊ฒฐ๊ณผ:", signal_lowpass)
print("๊ณ ์—ญ ํ†ต๊ณผ ํ•„ํ„ฐ ๊ฒฐ๊ณผ:", signal_highpass)

6. Mathematical Foundations of GNNs

6.1 Message Passing Framework

The core of GNNs is message passing:

$$\mathbf{h}_v^{(\ell+1)} = \sigma\left( \mathbf{W}^{(\ell)} \sum_{u \in \mathcal{N}(v)} \frac{\mathbf{h}_u^{(\ell)}}{c_{vu}} \right)$$

where: - $\mathbf{h}_v^{(\ell)}$: feature of vertex $v$ at layer $\ell$ - $\mathcal{N}(v)$: neighbors of vertex $v$ - $c_{vu}$: normalization constant - $\sigma$: activation function

6.2 Spectral Perspective: Graph Convolution

Spectral graph convolution:

$$\mathbf{g}_\theta \star \mathbf{f} = U \left( \text{diag}(\theta) U^T \mathbf{f} \right)$$

where $U$ is the eigenvector matrix of the Laplacian, and $\theta$ is a learnable filter parameter.

Problem: $O(n^2)$ computational complexity, eigenvalue decomposition required

6.3 ChebNet: Chebyshev Polynomial Approximation

Approximation using Chebyshev polynomials:

$$\mathbf{g}_\theta \star \mathbf{f} \approx \sum_{k=0}^{K-1} \theta_k T_k(\tilde{L}) \mathbf{f}$$

where: - $\tilde{L} = \frac{2}{\lambda_{\max}} L - I$ is the rescaled Laplacian - $T_k$ is the $k$-th Chebyshev polynomial: $T_0(x) = 1, T_1(x) = x, T_{k}(x) = 2xT_{k-1}(x) - T_{k-2}(x)$

6.4 GCN: First-order Approximation

Graph Convolutional Network (Kipf & Welling, 2017) is a simplification with $K=1$:

$$\mathbf{H}^{(\ell+1)} = \sigma\left( \tilde{D}^{-1/2} \tilde{A} \tilde{D}^{-1/2} \mathbf{H}^{(\ell)} \mathbf{W}^{(\ell)} \right)$$

where $\tilde{A} = A + I$ (adding self-loops), and $\tilde{D}$ is the degree matrix of $\tilde{A}$.

def gcn_layer(A, H, W, activation=lambda x: np.maximum(0, x)):
    """
    GCN ๋ ˆ์ด์–ด ๊ตฌํ˜„

    Parameters:
    -----------
    A : ndarray, shape (n, n)
        ์ธ์ ‘ ํ–‰๋ ฌ
    H : ndarray, shape (n, d_in)
        ์ž…๋ ฅ ํŠน์ง• ํ–‰๋ ฌ
    W : ndarray, shape (d_in, d_out)
        ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
    activation : function
        ํ™œ์„ฑํ™” ํ•จ์ˆ˜ (๊ธฐ๋ณธ: ReLU)

    Returns:
    --------
    H_out : ndarray, shape (n, d_out)
        ์ถœ๋ ฅ ํŠน์ง• ํ–‰๋ ฌ
    """
    n = A.shape[0]

    # ์ž๊ธฐ ๋ฃจํ”„ ์ถ”๊ฐ€
    A_tilde = A + np.eye(n)

    # ์ฐจ์ˆ˜ ํ–‰๋ ฌ
    D_tilde = np.diag(np.sum(A_tilde, axis=1))

    # ์ •๊ทœํ™”: D^{-1/2} A D^{-1/2}
    D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D_tilde)))
    A_hat = D_inv_sqrt @ A_tilde @ D_inv_sqrt

    # ๋ฉ”์‹œ์ง€ ํŒจ์‹ฑ + ์„ ํ˜• ๋ณ€ํ™˜ + ํ™œ์„ฑํ™”
    H_out = activation(A_hat @ H @ W)

    return H_out

# ์˜ˆ์ œ: ๊ฐ„๋‹จํ•œ 2์ธต GCN
np.random.seed(42)
n = 5
d_in = 3
d_hidden = 4
d_out = 2

# ์ดˆ๊ธฐ ํŠน์ง• ํ–‰๋ ฌ
X = np.random.randn(n, d_in)

# ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
W1 = np.random.randn(d_in, d_hidden) * 0.1
W2 = np.random.randn(d_hidden, d_out) * 0.1

# ์ˆœ์ „ํŒŒ
H1 = gcn_layer(A, X, W1)
print("๋ ˆ์ด์–ด 1 ์ถœ๋ ฅ shape:", H1.shape)

H2 = gcn_layer(A, H1, W2)
print("๋ ˆ์ด์–ด 2 ์ถœ๋ ฅ shape:", H2.shape)
print("\n์ตœ์ข… ์ž„๋ฒ ๋”ฉ:")
print(H2)

6.5 Graph Attention Networks (GAT)

GAT learns attention weights for neighbor vertices:

$$\alpha_{vu} = \frac{\exp(\text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_u]))}{\sum_{u' \in \mathcal{N}(v)} \exp(\text{LeakyReLU}(\mathbf{a}^T [\mathbf{W}\mathbf{h}_v \| \mathbf{W}\mathbf{h}_{u'}]))}$$

$$\mathbf{h}_v^{(\ell+1)} = \sigma\left( \sum_{u \in \mathcal{N}(v)} \alpha_{vu} \mathbf{W}^{(\ell)} \mathbf{h}_u^{(\ell)} \right)$$

def graph_attention_layer(A, H, W, a, alpha=0.2):
    """
    ๊ฐ„๋‹จํ•œ ๊ทธ๋ž˜ํ”„ ์–ดํ…์…˜ ๋ ˆ์ด์–ด

    Parameters:
    -----------
    A : ndarray, shape (n, n)
        ์ธ์ ‘ ํ–‰๋ ฌ
    H : ndarray, shape (n, d_in)
        ์ž…๋ ฅ ํŠน์ง•
    W : ndarray, shape (d_in, d_out)
        ํŠน์ง• ๋ณ€ํ™˜ ๊ฐ€์ค‘์น˜
    a : ndarray, shape (2 * d_out,)
        ์–ดํ…์…˜ ํŒŒ๋ผ๋ฏธํ„ฐ
    alpha : float
        LeakyReLU ๊ธฐ์šธ๊ธฐ

    Returns:
    --------
    H_out : ndarray, shape (n, d_out)
        ์ถœ๋ ฅ ํŠน์ง•
    """
    n = A.shape[0]

    # ํŠน์ง• ๋ณ€ํ™˜
    H_transformed = H @ W
    d_out = H_transformed.shape[1]

    # ์–ดํ…์…˜ ๊ณ„์‚ฐ
    attention_scores = np.zeros((n, n))

    for i in range(n):
        for j in range(n):
            if A[i, j] > 0 or i == j:  # ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๊ฑฐ๋‚˜ ์ž๊ธฐ ์ž์‹ 
                # ์—ฐ๊ฒฐ๋œ ํŠน์ง• [h_i || h_j]
                concat = np.concatenate([H_transformed[i], H_transformed[j]])

                # ์–ดํ…์…˜ ์ ์ˆ˜
                score = a @ concat
                attention_scores[i, j] = np.maximum(alpha * score, score)  # LeakyReLU
            else:
                attention_scores[i, j] = -np.inf

    # ์†Œํ”„ํŠธ๋งฅ์Šค
    attention_weights = np.exp(attention_scores - np.max(attention_scores, axis=1, keepdims=True))
    attention_weights = attention_weights / np.sum(attention_weights, axis=1, keepdims=True)

    # ์–ดํ…์…˜ ์ง‘๊ณ„
    H_out = attention_weights @ H_transformed

    return H_out, attention_weights

# ์˜ˆ์ œ
W_att = np.random.randn(d_in, d_hidden) * 0.1
a_att = np.random.randn(2 * d_hidden) * 0.1

H_gat, att_weights = graph_attention_layer(A, X, W_att, a_att)
print("GAT ์ถœ๋ ฅ shape:", H_gat.shape)
print("\n์–ดํ…์…˜ ๊ฐ€์ค‘์น˜:")
print(att_weights)

Practice Problems

Problem 1: Proof of Quadratic Form of Laplacian

For the Laplacian $L = D - A$ of an undirected graph, prove:

$$\mathbf{x}^T L \mathbf{x} = \frac{1}{2} \sum_{i,j} A_{ij}(x_i - x_j)^2$$

Use this result to show that the Laplacian is positive semidefinite.

Problem 2: Spectral Clustering Implementation

Without using sklearn, implement a complete spectral clustering algorithm using only NumPy. You must also implement the k-means step. Include:

  1. Normalized Laplacian computation
  2. Eigenvalue decomposition
  3. k-means clustering (Lloyd's algorithm)
  4. Cluster quality evaluation using silhouette score

Problem 3: Eigenvalue Interpretation of PageRank

Transform the PageRank equation $\mathbf{r} = (1 - d) \mathbf{e} + d P^T \mathbf{r}$ into an eigenvector problem. Explain that the dominant eigenvector of matrix $M = (1-d)\mathbf{e}\mathbf{1}^T + dP^T$ is the PageRank vector. Write code to compute this using the power iteration method.

Problem 4: Graph Fourier Transform Application

Generate a ring graph with 20 vertices and perform the following:

  1. Compute and visualize eigenvalues and eigenvectors of the Laplacian
  2. Compute the graph Fourier transform of low-frequency signals (e.g., $f_i = \cos(2\pi i / 20)$) and high-frequency signals (e.g., $f_i = (-1)^i$)
  3. Apply a Gaussian low-pass filter $h(\lambda) = \exp(-\lambda^2 / (2\sigma^2))$ and visualize the results

Problem 5: GCN vs GAT Comparison

Design an experiment to compare the behavior of GCN and GAT layers on a small graph (10-20 vertices):

  1. Compute output embeddings for both methods
  2. Visualize attention weights to analyze the neighbor importance learned by GAT
  3. Theoretically analyze computational complexity and measure actual execution time

References

Papers

  • Chung, F. R. K. (1997). Spectral Graph Theory. American Mathematical Society.
  • Von Luxburg, U. (2007). "A tutorial on spectral clustering." Statistics and Computing, 17(4), 395-416.
  • Kipf, T. N., & Welling, M. (2017). "Semi-supervised classification with graph convolutional networks." ICLR.
  • Veliฤkoviฤ‡, P., et al. (2018). "Graph Attention Networks." ICLR.
  • Bronstein, M. M., et al. (2021). "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges." arXiv:2104.13478.

Online Resources

Libraries

  • networkx: graph creation and analysis
  • scipy.sparse: sparse matrix operations
  • torch_geometric: GNN implementation
  • spektral: Keras-based GNN
to navigate between lessons