NoSQL Databases

NoSQL Databases

1. NoSQL Overview

1.1 NoSQL vs RDBMS

Item RDBMS NoSQL
Schema Strict schema Flexible schema
Scalability Vertical scaling Horizontal scaling
Transaction ACID BASE (some ACID)
Query SQL Various APIs
Use Cases Transactions, complex relationships Large volume, flexible data

1.2 Service Comparison

Type AWS GCP
Key-Value / Document DynamoDB Firestore
Wide Column - Bigtable
In-Memory Cache ElastiCache Memorystore
Document (MongoDB) DocumentDB MongoDB Atlas (Marketplace)

2. AWS DynamoDB

2.1 DynamoDB Overview

Features: - Fully managed Key-Value / Document DB - Millisecond latency - Unlimited scaling - Serverless (on-demand capacity)

Core Concepts: - Table: Data container - Item: Record - Attribute: Field - Primary Key: Partition key + (optional) sort key

2.2 Table Creation

# Create table (partition key only)
aws dynamodb create-table \
    --table-name Users \
    --attribute-definitions \
        AttributeName=userId,AttributeType=S \
    --key-schema \
        AttributeName=userId,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

# Create table (partition key + sort key)
aws dynamodb create-table \
    --table-name Orders \
    --attribute-definitions \
        AttributeName=customerId,AttributeType=S \
        AttributeName=orderId,AttributeType=S \
    --key-schema \
        AttributeName=customerId,KeyType=HASH \
        AttributeName=orderId,KeyType=RANGE \
    --billing-mode PAY_PER_REQUEST

# List tables
aws dynamodb list-tables

# Table information
aws dynamodb describe-table --table-name Users

2.3 CRUD Operations

# Add item (PutItem)
aws dynamodb put-item \
    --table-name Users \
    --item '{
        "userId": {"S": "user-001"},
        "name": {"S": "John Doe"},
        "email": {"S": "john@example.com"},
        "age": {"N": "30"}
    }'

# Get item (GetItem)
aws dynamodb get-item \
    --table-name Users \
    --key '{"userId": {"S": "user-001"}}'

# Update item (UpdateItem)
aws dynamodb update-item \
    --table-name Users \
    --key '{"userId": {"S": "user-001"}}' \
    --update-expression "SET age = :newAge" \
    --expression-attribute-values '{":newAge": {"N": "31"}}'

# Delete item (DeleteItem)
aws dynamodb delete-item \
    --table-name Users \
    --key '{"userId": {"S": "user-001"}}'

# Scan (entire table)
aws dynamodb scan --table-name Users

# Query (partition key based)
aws dynamodb query \
    --table-name Orders \
    --key-condition-expression "customerId = :cid" \
    --expression-attribute-values '{":cid": {"S": "customer-001"}}'

2.4 Python SDK (boto3)

import boto3
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')

# Add item
table.put_item(Item={
    'userId': 'user-002',
    'name': 'Jane Doe',
    'email': 'jane@example.com',
    'age': 25
})

# Get item
response = table.get_item(Key={'userId': 'user-002'})
item = response.get('Item')

# Query (with GSI)
response = table.query(
    IndexName='email-index',
    KeyConditionExpression='email = :email',
    ExpressionAttributeValues={':email': 'jane@example.com'}
)

# Batch write
with table.batch_writer() as batch:
    for i in range(100):
        batch.put_item(Item={'userId': f'user-{i}', 'name': f'User {i}'})

2.5 Global Secondary Index (GSI)

# Add GSI
aws dynamodb update-table \
    --table-name Users \
    --attribute-definitions \
        AttributeName=email,AttributeType=S \
    --global-secondary-index-updates '[
        {
            "Create": {
                "IndexName": "email-index",
                "KeySchema": [{"AttributeName": "email", "KeyType": "HASH"}],
                "Projection": {"ProjectionType": "ALL"}
            }
        }
    ]'

2.6 DynamoDB Streams

Stream for Change Data Capture (CDC).

# Enable stream
aws dynamodb update-table \
    --table-name Users \
    --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

# Connect Lambda trigger
aws lambda create-event-source-mapping \
    --function-name process-dynamodb \
    --event-source-arn arn:aws:dynamodb:...:table/Users/stream/xxx \
    --starting-position LATEST

3. GCP Firestore

3.1 Firestore Overview

Features: - Document-based NoSQL DB - Real-time synchronization - Offline support - Automatic scaling

Core Concepts: - Collection: Group of documents - Document: JSON-like data - Subcollection: Hierarchical structure

3.2 Firestore Setup

# Enable Firestore API
gcloud services enable firestore.googleapis.com

# Create database (Native mode)
gcloud firestore databases create \
    --location=asia-northeast3 \
    --type=firestore-native

3.3 Python SDK

from google.cloud import firestore

db = firestore.Client()

# Add document (auto ID)
doc_ref = db.collection('users').add({
    'name': 'John Doe',
    'email': 'john@example.com',
    'age': 30
})

# Add/update document (specified ID)
db.collection('users').document('user-001').set({
    'name': 'Jane Doe',
    'email': 'jane@example.com',
    'age': 25
})

# Get document
doc = db.collection('users').document('user-001').get()
if doc.exists:
    print(doc.to_dict())

# Partial update
db.collection('users').document('user-001').update({
    'age': 26
})

# Delete document
db.collection('users').document('user-001').delete()

# Query
users = db.collection('users').where('age', '>=', 25).stream()
for user in users:
    print(f'{user.id} => {user.to_dict()}')

# Complex query (index required)
users = db.collection('users') \
    .where('age', '>=', 25) \
    .order_by('age') \
    .limit(10) \
    .stream()

3.4 Real-time Listeners

# Document change detection
def on_snapshot(doc_snapshot, changes, read_time):
    for doc in doc_snapshot:
        print(f'Received document snapshot: {doc.id}')

doc_ref = db.collection('users').document('user-001')
doc_watch = doc_ref.on_snapshot(on_snapshot)

# Collection change detection
col_ref = db.collection('users')
col_watch = col_ref.on_snapshot(on_snapshot)

3.5 Security Rules

// firestore.rules
rules_version = '2';
service cloud.firestore {
  match /databases/{database}/documents {
    // Authenticated users can access only their own documents
    match /users/{userId} {
      allow read, write: if request.auth != null && request.auth.uid == userId;
    }

    // Public read
    match /public/{document=**} {
      allow read: if true;
      allow write: if request.auth != null;
    }
  }
}
# Deploy security rules
firebase deploy --only firestore:rules

4. In-Memory Cache

4.1 AWS ElastiCache

Supported Engines: - Redis - Memcached

# Create Redis cluster
aws elasticache create-cache-cluster \
    --cache-cluster-id my-redis \
    --engine redis \
    --cache-node-type cache.t3.micro \
    --num-cache-nodes 1 \
    --cache-subnet-group-name my-subnet-group \
    --security-group-ids sg-12345678

# Create replication group (high availability)
aws elasticache create-replication-group \
    --replication-group-id my-redis-cluster \
    --replication-group-description "Redis cluster" \
    --engine redis \
    --cache-node-type cache.t3.micro \
    --num-node-groups 1 \
    --replicas-per-node-group 1 \
    --automatic-failover-enabled \
    --cache-subnet-group-name my-subnet-group

# Check endpoint
aws elasticache describe-cache-clusters \
    --cache-cluster-id my-redis \
    --show-cache-node-info

Python Connection:

import redis

# Single node
r = redis.Redis(
    host='my-redis.xxx.cache.amazonaws.com',
    port=6379,
    decode_responses=True
)

# SET/GET
r.set('key', 'value')
value = r.get('key')

# Hash
r.hset('user:1000', mapping={'name': 'John', 'email': 'john@example.com'})
user = r.hgetall('user:1000')

# TTL
r.setex('session:abc', 3600, 'user_data')

4.2 GCP Memorystore

Supported Engines: - Redis - Memcached

# Create Redis instance
gcloud redis instances create my-redis \
    --region=asia-northeast3 \
    --tier=BASIC \
    --size=1 \
    --redis-version=redis_6_x

# Instance information
gcloud redis instances describe my-redis \
    --region=asia-northeast3

# Connection information (host/port)
gcloud redis instances describe my-redis \
    --region=asia-northeast3 \
    --format='value(host,port)'

Connection:

import redis

# Memorystore Redis (Private IP)
r = redis.Redis(
    host='10.0.0.3',  # Private IP
    port=6379,
    decode_responses=True
)

r.set('hello', 'world')
print(r.get('hello'))

5. Capacity Modes

5.1 DynamoDB Capacity Modes

Mode Features Best For
On-Demand Auto scaling, pay per request Unpredictable traffic
Provisioned Pre-specified capacity Stable traffic
# On-demand mode
aws dynamodb update-table \
    --table-name Users \
    --billing-mode PAY_PER_REQUEST

# Provisioned mode
aws dynamodb update-table \
    --table-name Users \
    --billing-mode PROVISIONED \
    --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100

# Auto Scaling setup
aws application-autoscaling register-scalable-target \
    --service-namespace dynamodb \
    --resource-id "table/Users" \
    --scalable-dimension "dynamodb:table:ReadCapacityUnits" \
    --min-capacity 5 \
    --max-capacity 1000

5.2 Firestore Capacity

Firestore is fully serverless with automatic scaling.

Pricing: - Document reads: $0.06 / 100,000 - Document writes: $0.18 / 100,000 - Document deletes: $0.02 / 100,000 - Storage: $0.18 / GB / month


6. Cost Comparison

6.1 DynamoDB

Item On-Demand Provisioned
Reads $0.25 / 1M RRU | $0.00013 / RCU / hour
Writes $1.25 / 1M WRU | $0.00065 / WCU / hour
Storage $0.25 / GB / month | $0.25 / GB / month

6.2 Firestore

Item Cost
Document reads $0.06 / 100,000
Document writes $0.18 / 100,000
Storage $0.18 / GB / month

6.3 ElastiCache / Memorystore

Service Node Type Hourly Cost
ElastiCache cache.t3.micro ~$0.02
ElastiCache cache.r5.large ~$0.20
Memorystore 1GB Basic ~$0.05
Memorystore 1GB Standard (HA) ~$0.10

7. Use Case Selection

Use Case Recommended Service
Session management ElastiCache / Memorystore
User profiles DynamoDB / Firestore
Real-time chat Firestore (real-time sync)
Game leaderboard ElastiCache Redis
IoT data DynamoDB / Bigtable
Shopping cart DynamoDB / Firestore
Caching ElastiCache / Memorystore

8. Design Patterns

8.1 DynamoDB Single Table Design

PK              | SK              | Attributes
----------------|-----------------|------------------
USER#123        | USER#123        | name, email
USER#123        | ORDER#001       | product, quantity
USER#123        | ORDER#002       | product, quantity
PRODUCT#A       | PRODUCT#A       | name, price
PRODUCT#A       | REVIEW#001      | rating, comment

8.2 Cache Patterns

Cache-Aside (Lazy Loading):

def get_user(user_id):
    # Check cache
    cached = cache.get(f'user:{user_id}')
    if cached:
        return cached

    # Query from DB
    user = db.get_user(user_id)

    # Save to cache
    cache.setex(f'user:{user_id}', 3600, user)
    return user

Write-Through:

def update_user(user_id, data):
    # Update DB
    db.update_user(user_id, data)

    # Update cache
    cache.set(f'user:{user_id}', data)

9. Next Steps


References

to navigate between lessons