19. ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-Text Search)

19. ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-Text Search)

ํ•™์Šต ๋ชฉํ‘œ

  • PostgreSQL ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-Text Search) ์•„ํ‚คํ…์ฒ˜ ์ดํ•ดํ•˜๊ธฐ
  • tsvector์™€ tsquery ๋ฐ์ดํ„ฐ ํƒ€์ž… ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ
  • ๋น ๋ฅธ ํ…์ŠคํŠธ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ GIN ์ธ๋ฑ์Šค ์ƒ์„ฑํ•˜๊ธฐ
  • ๊ฐ€์ค‘์น˜๋ฅผ ํ™œ์šฉํ•œ ๋žญํ‚น ๊ฒ€์ƒ‰ ๊ตฌํ˜„ํ•˜๊ธฐ
  • ๋‹ค๊ตญ์–ด ๊ฒ€์ƒ‰ ์ง€์› ์„ค์ •ํ•˜๊ธฐ
  • ํผ์ง€ ๋งค์นญ(Fuzzy Matching)์„ ์œ„ํ•œ pg_trgm ํ™œ์šฉํ•˜๊ธฐ

๋ชฉ์ฐจ

  1. ์ „๋ฌธ ๊ฒ€์ƒ‰ ๊ฐœ์š”
  2. tsvector์™€ tsquery
  3. ๊ฒ€์ƒ‰ ๊ตฌ์„ฑ
  4. GIN๊ณผ GiST ์ธ๋ฑ์Šค
  5. ๋žญํ‚น๊ณผ ๊ฐ€์ค‘์น˜
  6. ๊ณ ๊ธ‰ ๊ฒ€์ƒ‰ ๊ธฐ๋ฒ•
  7. ํผ์ง€ ๋งค์นญ์„ ์œ„ํ•œ pg_trgm
  8. ์—ฐ์Šต ๋ฌธ์ œ

1. ์ „๋ฌธ ๊ฒ€์ƒ‰ ๊ฐœ์š”

1.1 ์™œ ์ „๋ฌธ ๊ฒ€์ƒ‰์ธ๊ฐ€?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              LIKE vs ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-Text Search)                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                 โ”‚
โ”‚  LIKE / ILIKE:                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ SELECT * FROM articles                   โ”‚                   โ”‚
โ”‚  โ”‚ WHERE body ILIKE '%database%';           โ”‚                   โ”‚
โ”‚  โ”‚                                          โ”‚                   โ”‚
โ”‚  โ”‚ โœ— ์ธ๋ฑ์Šค ์‚ฌ์šฉ ๋ถˆ๊ฐ€ (์ˆœ์ฐจ ์Šค์บ”)            โ”‚                   โ”‚
โ”‚  โ”‚ โœ— ์–ธ์–ด ์ธ์‹ ๊ธฐ๋Šฅ ์—†์Œ                     โ”‚                   โ”‚
โ”‚  โ”‚ โœ— ๋žญํ‚น ๊ธฐ๋Šฅ ์—†์Œ                          โ”‚                   โ”‚
โ”‚  โ”‚ โœ— "databases"๊ฐ€ "database"์™€ ๋งค์น˜ ์•ˆ ๋จ   โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                                                                 โ”‚
โ”‚  ์ „๋ฌธ ๊ฒ€์ƒ‰(Full-Text Search):                                    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ SELECT * FROM articles                   โ”‚                   โ”‚
โ”‚  โ”‚ WHERE to_tsvector(body)                  โ”‚                   โ”‚
โ”‚  โ”‚       @@ to_tsquery('database');          โ”‚                   โ”‚
โ”‚  โ”‚                                          โ”‚                   โ”‚
โ”‚  โ”‚ โœ“ GIN ์ธ๋ฑ์Šค ์ง€์› (๋น ๋ฆ„)                  โ”‚                   โ”‚
โ”‚  โ”‚ โœ“ ํ˜•ํƒœ์†Œ ๋ถ„์„ (databases โ†’ database)      โ”‚                   โ”‚
โ”‚  โ”‚ โœ“ ๊ด€๋ จ์„ฑ์— ๋”ฐ๋ฅธ ๋žญํ‚น                      โ”‚                   โ”‚
โ”‚  โ”‚ โœ“ ๋ถˆ๋ฆฌ์–ธ ์—ฐ์‚ฐ์ž (AND, OR, NOT)            โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

1.2 ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              ์ „๋ฌธ ๊ฒ€์ƒ‰ ํŒŒ์ดํ”„๋ผ์ธ                                โ”‚
โ”‚                                                                 โ”‚
โ”‚  ๋ฌธ์„œ ํ…์ŠคํŠธ                                                    โ”‚
โ”‚       โ”‚                                                         โ”‚
โ”‚       โ–ผ                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                            โ”‚
โ”‚  โ”‚   Parser         โ”‚โ”€โ”€โ–ถ ๋‹จ์–ด๋กœ ํ† ํฐํ™”                          โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                            โ”‚
โ”‚       โ”‚                                                         โ”‚
โ”‚       โ–ผ                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                            โ”‚
โ”‚  โ”‚  Dictionaries    โ”‚โ”€โ”€โ–ถ ์ •๊ทœํ™” (๋ถˆ์šฉ์–ด, ์–ด๊ฐ„)                  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                            โ”‚
โ”‚       โ”‚                                                         โ”‚
โ”‚       โ–ผ                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                            โ”‚
โ”‚  โ”‚   tsvector       โ”‚โ”€โ”€โ–ถ ์ •๋ ฌ๋œ ๋ ‰์‹ฌ ๋ฆฌ์ŠคํŠธ + ์œ„์น˜              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                            โ”‚
โ”‚       โ”‚                                                         โ”‚
โ”‚       โ–ผ                                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                            โ”‚
โ”‚  โ”‚  GIN Index       โ”‚โ”€โ”€โ–ถ ๋ ‰์‹ฌ๋ณ„ ๋น ๋ฅธ ๊ฒ€์ƒ‰                       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                            โ”‚
โ”‚                                                                 โ”‚
โ”‚  ์ฟผ๋ฆฌ ํ…์ŠคํŠธ โ”€โ”€โ–ถ tsquery โ”€โ”€โ–ถ tsvector์™€ ๋งค์น˜                   โ”‚
โ”‚                                                                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. tsvector์™€ tsquery

2.1 tsvector โ€” ๋ฌธ์„œ ํ‘œํ˜„

-- ๊ธฐ๋ณธ ๋ณ€ํ™˜
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- ๊ฒฐ๊ณผ: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- ์ฃผ์˜: ๋ถˆ์šฉ์–ด ์ œ๊ฑฐ๋จ ("the", "over"), ๋‹จ์–ด ์–ด๊ฐ„ ์ถ”์ถœ๋จ ("foxes" โ†’ "fox")

-- tsvector๋Š” ๋ ‰์‹ฌ์„ ์œ„์น˜์™€ ํ•จ๊ป˜ ์ €์žฅ
SELECT to_tsvector('english', 'A fat cat sat on a mat. A fat cat ate a fat rat.');
-- ๊ฒฐ๊ณผ: 'ate':10 'cat':3,8 'fat':2,7,11 'mat':6 'rat':12 'sat':4

2.2 tsquery โ€” ์ฟผ๋ฆฌ ํ‘œํ˜„

-- ๊ธฐ๋ณธ ์ฟผ๋ฆฌ
SELECT to_tsquery('english', 'cats & dogs');
-- ๊ฒฐ๊ณผ: 'cat' & 'dog'

-- ๋ถˆ๋ฆฌ์–ธ ์—ฐ์‚ฐ์ž
SELECT to_tsquery('english', 'cat | dog');        -- OR
SELECT to_tsquery('english', 'cat & !dog');       -- AND NOT
SELECT to_tsquery('english', 'cat <-> dog');      -- FOLLOWED BY (๊ตฌ๋ฌธ)
SELECT to_tsquery('english', 'cat <2> dog');      -- 2๋‹จ์–ด ์ด๋‚ด

-- plainto_tsquery: ๋” ๊ฐ„๋‹จํ•œ ๊ตฌ๋ฌธ (์•”๋ฌต์  AND)
SELECT plainto_tsquery('english', 'fat cats');
-- ๊ฒฐ๊ณผ: 'fat' & 'cat'

-- phraseto_tsquery: ์ •ํ™•ํ•œ ๊ตฌ๋ฌธ ๋งค์นญ
SELECT phraseto_tsquery('english', 'fat cats');
-- ๊ฒฐ๊ณผ: 'fat' <-> 'cat'

-- websearch_to_tsquery: ์›น ์Šคํƒ€์ผ ๊ตฌ๋ฌธ (PostgreSQL 11+)
SELECT websearch_to_tsquery('english', '"fat cats" -dogs');
-- ๊ฒฐ๊ณผ: 'fat' <-> 'cat' & !'dog'

SELECT websearch_to_tsquery('english', 'cats or dogs');
-- ๊ฒฐ๊ณผ: 'cat' | 'dog'

2.3 ๋งค์น˜ ์—ฐ์‚ฐ์ž(@@)

-- tsvector @@ tsquery
SELECT to_tsvector('english', 'The fat cats sat on the mat')
       @@ to_tsquery('english', 'cat & mat');
-- ๊ฒฐ๊ณผ: true

SELECT to_tsvector('english', 'The fat cats sat on the mat')
       @@ to_tsquery('english', 'cat & dog');
-- ๊ฒฐ๊ณผ: false

-- ์‹ค์šฉ์ ์ธ ์˜ˆ์ œ
CREATE TABLE articles (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

INSERT INTO articles (title, body) VALUES
('PostgreSQL Full-Text Search', 'PostgreSQL provides powerful full-text search capabilities using tsvector and tsquery.'),
('Database Indexing Guide', 'Indexes are crucial for database performance. B-tree, GIN, and GiST are common index types.'),
('Introduction to SQL', 'SQL is a standard language for managing relational databases. SELECT, INSERT, UPDATE, DELETE.'),
('Advanced Query Optimization', 'Query optimization involves analyzing execution plans and choosing efficient access paths.');

-- ๋ฌธ์„œ ๊ฒ€์ƒ‰
SELECT title, ts_rank(to_tsvector('english', body), query) AS rank
FROM articles, to_tsquery('english', 'database & index') AS query
WHERE to_tsvector('english', body) @@ query
ORDER BY rank DESC;

2.4 ์ €์žฅ๋œ tsvector ์ปฌ๋Ÿผ

-- ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ƒ์„ฑ๋œ tsvector ์ปฌ๋Ÿผ ์ถ”๊ฐ€
ALTER TABLE articles ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED;

-- ์ด์ œ ์ฟผ๋ฆฌ๋Š” ๋ฏธ๋ฆฌ ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ์„ ์‚ฌ์šฉ
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database');

3. ๊ฒ€์ƒ‰ ๊ตฌ์„ฑ

3.1 ํ…์ŠคํŠธ ๊ฒ€์ƒ‰ ๊ตฌ์„ฑ

-- ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ตฌ์„ฑ ๋ชฉ๋ก
SELECT cfgname FROM pg_ts_config;

-- ์ผ๋ฐ˜์ ์ธ ๊ตฌ์„ฑ:
-- 'simple'    โ€” ํ˜•ํƒœ์†Œ ๋ถ„์„ ์—†์Œ, ๋ถˆ์šฉ์–ด ์—†์Œ
-- 'english'   โ€” ์˜์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐ ๋ถˆ์šฉ์–ด
-- 'german'    โ€” ๋…์ผ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„
-- 'spanish'   โ€” ์ŠคํŽ˜์ธ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„

-- ํ˜„์žฌ ๊ธฐ๋ณธ๊ฐ’ ํ‘œ์‹œ
SHOW default_text_search_config;

-- ๊ธฐ๋ณธ๊ฐ’ ์„ค์ •
SET default_text_search_config = 'english';

-- ๊ตฌ์„ฑ ๋น„๊ต
SELECT to_tsvector('simple', 'The running dogs are quickly jumping');
-- ๊ฒฐ๊ณผ: 'are':4 'dogs':3 'jumping':6 'quickly':5 'running':2 'the':1

SELECT to_tsvector('english', 'The running dogs are quickly jumping');
-- ๊ฒฐ๊ณผ: 'dog':3 'jump':6 'quick':5 'run':2

3.2 ์‚ฌ์ „๊ณผ ๋ถˆ์šฉ์–ด

-- ๊ตฌ์„ฑ์— ๋Œ€ํ•œ ์‚ฌ์ „ ํ‘œ์‹œ
SELECT * FROM ts_debug('english', 'The quick brown foxes are jumping');

-- ์ถœ๋ ฅ์€ ํ† ํฐ โ†’ ์‚ฌ์ „ โ†’ ๋ ‰์‹ฌ ๋งคํ•‘์„ ํ‘œ์‹œ
-- Token     | Dictionary   | Lexemes
-- ----------|-------------|--------
-- The       | english_stem | {stop word}
-- quick     | english_stem | {quick}
-- brown     | english_stem | {brown}
-- foxes     | english_stem | {fox}
-- are       | english_stem | {stop word}
-- jumping   | english_stem | {jump}

3.3 ์‚ฌ์šฉ์ž ์ •์˜ ๊ตฌ์„ฑ

-- ์˜์–ด ๊ธฐ๋ฐ˜ ์‚ฌ์šฉ์ž ์ •์˜ ๊ตฌ์„ฑ ์ƒ์„ฑ
CREATE TEXT SEARCH CONFIGURATION my_english (COPY = english);

-- ๋™์˜์–ด ์‚ฌ์ „ ์ถ”๊ฐ€
CREATE TEXT SEARCH DICTIONARY my_synonyms (
    TEMPLATE = synonym,
    SYNONYMS = my_synonyms  -- $SHAREDIR/tsearch_data/my_synonyms.syn ์ฐธ์กฐ
);

-- ๊ตฌ์„ฑ์— ์ถ”๊ฐ€
ALTER TEXT SEARCH CONFIGURATION my_english
    ALTER MAPPING FOR asciiword
    WITH my_synonyms, english_stem;

4. GIN๊ณผ GiST ์ธ๋ฑ์Šค

4.1 ์ „๋ฌธ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ GIN ์ธ๋ฑ์Šค

-- tsvector ์ปฌ๋Ÿผ์— GIN ์ธ๋ฑ์Šค ์ƒ์„ฑ
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);

-- ๋˜๋Š” ํ‘œํ˜„์‹์— (๋นŒ๋“œ๋Š” ๋А๋ฆฌ์ง€๋งŒ, ์ €์žฅ๋œ ์ปฌ๋Ÿผ ๋ถˆํ•„์š”)
CREATE INDEX idx_articles_body_gin ON articles
    USING GIN (to_tsvector('english', body));

-- ์ฟผ๋ฆฌ๋Š” ์ž๋™์œผ๋กœ ์ธ๋ฑ์Šค ์‚ฌ์šฉ
EXPLAIN ANALYZE
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'database');
-- Bitmap Index Scan on idx_articles_search

4.2 GIN vs GiST ๋น„๊ต

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              ์ „๋ฌธ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ GIN vs GiST                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                โ”‚ GIN              โ”‚ GiST                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ ๋นŒ๋“œ ์†๋„       โ”‚ ๋А๋ฆผ             โ”‚ ๋น ๋ฆ„                          โ”‚
โ”‚ ๊ฒ€์ƒ‰ ์†๋„       โ”‚ ๋น ๋ฆ„ (3๋ฐฐ)       โ”‚ ๋А๋ฆผ                          โ”‚
โ”‚ ์ธ๋ฑ์Šค ํฌ๊ธฐ     โ”‚ ํผ               โ”‚ ์ž‘์Œ                          โ”‚
โ”‚ ์—…๋ฐ์ดํŠธ ๋น„์šฉ   โ”‚ ๋†’์Œ             โ”‚ ๋‚ฎ์Œ                          โ”‚
โ”‚ ์ •ํ™•ํ•œ ๊ฒฐ๊ณผ     โ”‚ ์˜ˆ               โ”‚ ๊ฑฐ์ง“ ๊ธ์ • ๊ฐ€๋Šฅ                 โ”‚
โ”‚ ์ตœ์ ํ™”         โ”‚ ์ •์  ๋ฐ์ดํ„ฐ,      โ”‚ ์ž์ฃผ ์—…๋ฐ์ดํŠธ๋˜๋Š” ๋ฐ์ดํ„ฐ,      โ”‚
โ”‚                โ”‚ ์ฝ๊ธฐ ์œ„์ฃผ         โ”‚ ์“ฐ๊ธฐ ์œ„์ฃผ                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
-- GiST ์ธ๋ฑ์Šค (๋Œ€์•ˆ)
CREATE INDEX idx_articles_search_gist ON articles USING GiST (search_vector);

-- fastupdate๊ฐ€ ์žˆ๋Š” GIN (๊ธฐ๋ณธ๊ฐ’ on, ๋ฐฐ์น˜ ์‚ฝ์ž…์— ์ข‹์Œ)
CREATE INDEX idx_articles_search_gin ON articles
    USING GIN (search_vector) WITH (fastupdate = on);

4.3 ์ธ๋ฑ์Šค ์œ ์ง€๋ณด์ˆ˜

-- ์ธ๋ฑ์Šค ํฌ๊ธฐ ํ™•์ธ
SELECT pg_size_pretty(pg_relation_size('idx_articles_search'));

-- ํ•„์š”์‹œ ์žฌ์ธ๋ฑ์‹ฑ
REINDEX INDEX idx_articles_search;

-- ์ฟผ๋ฆฌ ํ”Œ๋ž˜๋„ˆ๋ฅผ ์œ„ํ•œ ๋ถ„์„
ANALYZE articles;

5. ๋žญํ‚น๊ณผ ๊ฐ€์ค‘์น˜

5.1 ts_rank

-- ๊ธฐ๋ณธ ๋žญํ‚น
SELECT
    title,
    ts_rank(search_vector, to_tsquery('english', 'database')) AS rank
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database')
ORDER BY rank DESC;

-- ์ •๊ทœํ™” ์˜ต์…˜ (๋น„ํŠธ๋งˆ์Šคํฌ)
-- 0  = ๊ธฐ๋ณธ๊ฐ’ (๋ฌธ์„œ ๊ธธ์ด ๋ฌด์‹œ)
-- 1  = 1 + log(๋ฌธ์„œ ๊ธธ์ด)๋กœ ๋‚˜๋ˆ”
-- 2  = ๋ฌธ์„œ ๊ธธ์ด๋กœ ๋‚˜๋ˆ”
-- 4  = ๋ฒ”์œ„ ๊ฐ„ ํ‰๊ท  ์กฐํ™” ๊ฑฐ๋ฆฌ๋กœ ๋‚˜๋ˆ”
-- 8  = ๊ณ ์œ  ๋‹จ์–ด ์ˆ˜๋กœ ๋‚˜๋ˆ”
-- 16 = 1 + log(๊ณ ์œ  ๋‹จ์–ด)๋กœ ๋‚˜๋ˆ”
-- 32 = ์ž๊ธฐ ์ž์‹  + 1๋กœ ๋‚˜๋ˆ”

SELECT
    title,
    ts_rank(search_vector, query, 2) AS rank_normalized  -- ๊ธธ์ด๋กœ ์ •๊ทœํ™”
FROM articles, to_tsquery('english', 'database') AS query
WHERE search_vector @@ query
ORDER BY rank_normalized DESC;

5.2 ts_rank_cd (์ปค๋ฒ„ ๋ฐ€๋„)

-- ์ปค๋ฒ„ ๋ฐ€๋„ ๋žญํ‚น์€ ๋งค์นญ ์šฉ์–ด์˜ ๊ทผ์ ‘์„ฑ์„ ๊ณ ๋ ค
SELECT
    title,
    ts_rank_cd(search_vector, to_tsquery('english', 'database & index')) AS rank_cd
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database & index')
ORDER BY rank_cd DESC;

5.3 ๊ฐ€์ค‘์น˜ ๊ฒ€์ƒ‰

-- ๊ฐ€์ค‘์น˜: A (1.0), B (0.4), C (0.2), D (0.1)
-- ์‚ฌ์šฉ์ž ์ •์˜ ๊ฐ€์ค‘์น˜ ๋ฐฐ์—ด: {D, C, B, A}

-- ์‚ฌ์šฉ์ž ์ •์˜ ๊ฐ€์ค‘์น˜ ์ ์šฉ
SELECT
    title,
    ts_rank('{0.1, 0.2, 0.4, 1.0}', search_vector, query) AS weighted_rank
FROM articles, to_tsquery('english', 'database') AS query
WHERE search_vector @@ query
ORDER BY weighted_rank DESC;

-- ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ๋Š” tsvector ์ƒ์„ฑ
SELECT
    setweight(to_tsvector('english', 'PostgreSQL Guide'), 'A') ||
    setweight(to_tsvector('english', 'A comprehensive database tutorial'), 'B') ||
    setweight(to_tsvector('english', 'Learn SQL queries and optimization'), 'C');

5.4 ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ํ•˜์ด๋ผ์ดํŒ…

-- ts_headline์€ ๋งค์นญ ์šฉ์–ด๋ฅผ ํ•˜์ด๋ผ์ดํŠธ
SELECT
    title,
    ts_headline('english', body, to_tsquery('english', 'database'),
        'StartSel=<b>, StopSel=</b>, MaxWords=35, MinWords=15'
    ) AS highlighted
FROM articles
WHERE search_vector @@ to_tsquery('english', 'database');

-- ์˜ต์…˜:
-- StartSel / StopSel โ€” ํ•˜์ด๋ผ์ดํŠธ ๋งˆ์ปค
-- MaxWords / MinWords โ€” ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ ํฌ๊ธฐ
-- ShortWord โ€” ํ‘œ์‹œํ•  ์ตœ์†Œ ๋‹จ์–ด ๊ธธ์ด
-- MaxFragments โ€” ์กฐ๊ฐ ์ˆ˜ (0 = ์ „์ฒด ๋ฌธ์„œ)
-- FragmentDelimiter โ€” ์กฐ๊ฐ ๊ฐ„ ๊ตฌ๋ถ„์ž

6. ๊ณ ๊ธ‰ ๊ฒ€์ƒ‰ ๊ธฐ๋ฒ•

6.1 ๋‹ค์ค‘ ์ปฌ๋Ÿผ ๊ฒ€์ƒ‰

-- ๋‹ค๋ฅธ ๊ฐ€์ค‘์น˜๋กœ ์—ฌ๋Ÿฌ ์ปฌ๋Ÿผ์— ๊ฑธ์ณ ๊ฒ€์ƒ‰
CREATE TABLE blog_posts (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    summary TEXT,
    body TEXT NOT NULL,
    tags TEXT[],
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(summary, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(array_to_string(tags, ' '), '')), 'A')
    ) STORED
);

CREATE INDEX idx_blog_search ON blog_posts USING GIN (search_vector);

6.2 ๊ตฌ๋ฌธ ๊ฒ€์ƒ‰

-- ์ •ํ™•ํ•œ ๊ตฌ๋ฌธ: "full text search"
SELECT title FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'full text search');

-- ๊ทผ์ ‘์„ฑ: N๊ฐœ ์œ„์น˜ ๋‚ด์˜ ๋‹จ์–ด
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'full <3> search');
-- "full"๊ณผ "search"๊ฐ€ ์„œ๋กœ 3๋‹จ์–ด ์ด๋‚ด

6.3 ํ•„ํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒ€์ƒ‰

-- ์ „๋ฌธ ๊ฒ€์ƒ‰๊ณผ ์ผ๋ฐ˜ WHERE ์ ˆ ๊ฒฐํ•ฉ
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database') AS query
WHERE search_vector @@ query
  AND created_at >= NOW() - INTERVAL '30 days'
ORDER BY rank DESC
LIMIT 10;

-- ๊ฒฐํ•ฉ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ ๋ณตํ•ฉ ์ธ๋ฑ์Šค
CREATE INDEX idx_articles_date_search ON articles
    USING GIN (search_vector) WHERE created_at >= '2024-01-01';

6.4 ์ž๋™ ์™„์„ฑ / ์ ‘๋‘์–ด ๊ฒ€์ƒ‰

-- :*๋ฅผ ์‚ฌ์šฉํ•œ ์ ‘๋‘์–ด ๋งค์นญ
SELECT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'dat:*');
-- ๋งค์น˜: database, data, date ๋“ฑ

-- ์‹ค์šฉ์ ์ธ ์ž๋™ ์™„์„ฑ ํ•จ์ˆ˜
CREATE OR REPLACE FUNCTION search_autocomplete(search_term TEXT, max_results INT DEFAULT 10)
RETURNS TABLE(title TEXT, rank REAL) AS $$
BEGIN
    RETURN QUERY
    SELECT a.title, ts_rank(a.search_vector, query) AS rank
    FROM articles a, to_tsquery('english', search_term || ':*') AS query
    WHERE a.search_vector @@ query
    ORDER BY rank DESC
    LIMIT max_results;
END;
$$ LANGUAGE plpgsql;

SELECT * FROM search_autocomplete('post');

6.5 ๊ฒ€์ƒ‰ ํ†ต๊ณ„

-- ์ฝ”ํผ์Šค์—์„œ ๊ฐ€์žฅ ํ”ํ•œ ๋‹จ์–ด
SELECT word, ndoc, nentry
FROM ts_stat('SELECT search_vector FROM articles')
ORDER BY nentry DESC
LIMIT 20;

-- ํŠน์ • ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๋‹จ์–ด ๋นˆ๋„
SELECT word, ndoc
FROM ts_stat('SELECT search_vector FROM articles')
WHERE word LIKE 'data%'
ORDER BY ndoc DESC;

7. ํผ์ง€ ๋งค์นญ์„ ์œ„ํ•œ pg_trgm

7.1 ํŠธ๋ผ์ด๊ทธ๋žจ ๊ธฐ์ดˆ

-- ํ™•์žฅ ํ™œ์„ฑํ™”
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- ํŠธ๋ผ์ด๊ทธ๋žจ ํ‘œ์‹œ
SELECT show_trgm('PostgreSQL');
-- ๊ฒฐ๊ณผ: {"  p"," po","gre","osq","pos","ostg","pgr","ql ","res","sql","stg","tgr"}

-- ์œ ์‚ฌ๋„ ์ ์ˆ˜ (0~1)
SELECT similarity('PostgreSQL', 'Postgresql');  -- ~0.75
SELECT similarity('PostgreSQL', 'MySQL');       -- ~0.09

7.2 ํŠธ๋ผ์ด๊ทธ๋žจ ์ธ๋ฑ์Šค

-- GIN ํŠธ๋ผ์ด๊ทธ๋žจ ์ธ๋ฑ์Šค
CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);

-- GiST ํŠธ๋ผ์ด๊ทธ๋žจ ์ธ๋ฑ์Šค (๊ฑฐ๋ฆฌ ์—ฐ์‚ฐ์ž ์ง€์›)
CREATE INDEX idx_articles_title_trgm_gist ON articles USING GiST (title gist_trgm_ops);

-- ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰
SELECT title, similarity(title, 'Postgre') AS sim
FROM articles
WHERE title % 'Postgre'  -- % ์—ฐ์‚ฐ์ž๋Š” ์œ ์‚ฌ๋„ ์ž„๊ณ„๊ฐ’ ์‚ฌ์šฉ
ORDER BY sim DESC;

-- ์œ ์‚ฌ๋„ ์ž„๊ณ„๊ฐ’ ์„ค์ • (๊ธฐ๋ณธ๊ฐ’ 0.3)
SET pg_trgm.similarity_threshold = 0.2;

-- ๊ฑฐ๋ฆฌ ์—ฐ์‚ฐ์ž (GiST๋งŒ)
SELECT title, title <-> 'Postgre' AS distance
FROM articles
ORDER BY title <-> 'Postgre'
LIMIT 5;

7.3 ํŠธ๋ผ์ด๊ทธ๋žจ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•œ LIKE/ILIKE

-- ํŠธ๋ผ์ด๊ทธ๋žจ ์ธ๋ฑ์Šค๋Š” LIKE ์ฟผ๋ฆฌ๋„ ๊ฐ€์†ํ™”
CREATE INDEX idx_articles_body_trgm ON articles USING GIN (body gin_trgm_ops);

-- ์ด์ œ ์ด ์ฟผ๋ฆฌ๋“ค์€ ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉ
SELECT title FROM articles WHERE body LIKE '%database%';
SELECT title FROM articles WHERE body ILIKE '%DATABASE%';

-- ์ •๊ทœ ํ‘œํ˜„์‹ ์ฟผ๋ฆฌ๋„ ์ด์  ์–ป์Œ
SELECT title FROM articles WHERE body ~ 'data(base|set)';

7.4 FTS์™€ pg_trgm ๊ฒฐํ•ฉ

-- ์–‘์ชฝ ์žฅ์ : ๊ด€๋ จ์„ฑ์€ ์ „๋ฌธ ๊ฒ€์ƒ‰, ์˜คํƒ€ ํ—ˆ์šฉ์€ ํŠธ๋ผ์ด๊ทธ๋žจ
CREATE OR REPLACE FUNCTION smart_search(search_term TEXT, max_results INT DEFAULT 10)
RETURNS TABLE(id INT, title TEXT, score REAL) AS $$
BEGIN
    -- ๋จผ์ € ์ •ํ™•ํ•œ ์ „๋ฌธ ๊ฒ€์ƒ‰ ์‹œ๋„
    RETURN QUERY
    SELECT a.id, a.title,
           ts_rank(a.search_vector, websearch_to_tsquery('english', search_term)) AS score
    FROM articles a
    WHERE a.search_vector @@ websearch_to_tsquery('english', search_term)
    ORDER BY score DESC
    LIMIT max_results;

    -- ๊ฒฐ๊ณผ๊ฐ€ ์—†์œผ๋ฉด ํผ์ง€ ๋งค์นญ์œผ๋กœ ํด๋ฐฑ
    IF NOT FOUND THEN
        RETURN QUERY
        SELECT a.id, a.title, similarity(a.title, search_term) AS score
        FROM articles a
        WHERE a.title % search_term OR a.body % search_term
        ORDER BY score DESC
        LIMIT max_results;
    END IF;
END;
$$ LANGUAGE plpgsql;

8. ์—ฐ์Šต ๋ฌธ์ œ

์—ฐ์Šต 1: ์ œํ’ˆ ๊ฒ€์ƒ‰

์ „์ž์ƒ๊ฑฐ๋ž˜ ์ œํ’ˆ ์นดํƒˆ๋กœ๊ทธ๋ฅผ ์œ„ํ•œ ์ „๋ฌธ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜์„ธ์š”.

-- ์˜ˆ์‹œ ๋‹ต์•ˆ
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    price NUMERIC(10,2),
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'C')
    ) STORED
);

CREATE INDEX idx_products_search ON products USING GIN (search_vector);

-- ์นดํ…Œ๊ณ ๋ฆฌ ํ•„ํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒ€์ƒ‰ ํ•จ์ˆ˜
CREATE OR REPLACE FUNCTION search_products(
    search_term TEXT,
    category_filter TEXT DEFAULT NULL,
    max_results INT DEFAULT 20
)
RETURNS TABLE(id INT, name TEXT, price NUMERIC, rank REAL) AS $$
BEGIN
    RETURN QUERY
    SELECT p.id, p.name, p.price,
           ts_rank(p.search_vector, websearch_to_tsquery('english', search_term)) AS rank
    FROM products p
    WHERE p.search_vector @@ websearch_to_tsquery('english', search_term)
      AND (category_filter IS NULL OR p.category = category_filter)
    ORDER BY rank DESC
    LIMIT max_results;
END;
$$ LANGUAGE plpgsql;

์—ฐ์Šต 2: ํ•˜์ด๋ผ์ดํŒ…์ด ์žˆ๋Š” ๊ฒ€์ƒ‰

ํ•˜์ด๋ผ์ดํŠธ๋œ ๋งค์น˜๊ฐ€ ์žˆ๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”.

-- ์˜ˆ์‹œ ๋‹ต์•ˆ
SELECT
    id,
    ts_headline('english', name, query,
        'StartSel=<mark>, StopSel=</mark>') AS highlighted_name,
    ts_headline('english', description, query,
        'StartSel=<mark>, StopSel=</mark>, MaxFragments=2, FragmentDelimiter= ... ') AS highlighted_desc,
    ts_rank(search_vector, query) AS relevance
FROM products, websearch_to_tsquery('english', 'wireless bluetooth') AS query
WHERE search_vector @@ query
ORDER BY relevance DESC;

์—ฐ์Šต 3: ๋‹ค๊ตญ์–ด ๊ฒ€์ƒ‰

์˜์–ด์™€ simple(ํ˜•ํƒœ์†Œ ๋ถ„์„ ์—†์Œ) ํ…์ŠคํŠธ๋ฅผ ๋ชจ๋‘ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜์„ธ์š”.

-- ์˜ˆ์‹œ ๋‹ต์•ˆ
CREATE TABLE multilingual_docs (
    id SERIAL PRIMARY KEY,
    content_en TEXT,
    content_raw TEXT,
    search_en tsvector GENERATED ALWAYS AS (
        to_tsvector('english', coalesce(content_en, ''))
    ) STORED,
    search_simple tsvector GENERATED ALWAYS AS (
        to_tsvector('simple', coalesce(content_raw, ''))
    ) STORED
);

CREATE INDEX idx_ml_en ON multilingual_docs USING GIN (search_en);
CREATE INDEX idx_ml_simple ON multilingual_docs USING GIN (search_simple);

-- ์–‘์ชฝ์— ๊ฑธ์ณ ๊ฒ€์ƒ‰
SELECT id, content_en
FROM multilingual_docs
WHERE search_en @@ websearch_to_tsquery('english', 'search term')
   OR search_simple @@ websearch_to_tsquery('simple', 'search term');

๋‹ค์Œ ๋‹จ๊ณ„

์ฐธ๊ณ  ์ž๋ฃŒ

to navigate between lessons