Đã đăng vào Oct 7th, 2025 9:09 a.m. 2 phút đọc

135

Database Storage & Fragmentation Complete

📚 Tổng quan

Chúng ta đã khám phá sâu về cách MySQL lưu trữ và quản lý data - từ page structure cơ bản, đến fragmentation issues, và strategies để duy trì performance tối ưu. Đây là kiến thức thực tế giúp hiểu tại sao database performance thay đổi theo thời gian và cách khắc phục.

1. 📄 Page Structure & Data Storage

Khái niệm cơ bản:

Page = Đơn vị lưu trữ cơ bản của MySQL

Kích thước: 16KB mỗi page (cố định)
Chứa: User records + metadata + free space
I/O unit: MySQL luôn đọc/ghi theo đơn vị page

Cách MySQL load data:

-- Query: SELECT * FROM users WHERE id = 1000;

🔍 Quá trình execution:
1. 📖 Determine pages cần đọc (từ index hoặc full scan)
2. 💾 Load pages từ disk vào buffer pool (memory)  
3. 🔍 Scan records trong pages
4. ✅ Return matching records

Không có index: Phải đọc TẤT CẢ pages (full table scan)
Có index: Chỉ đọc vài pages cần thiết

Page capacity:

📊 Typical page utilization:
- Compact records (50-100 bytes): 200-300 records/page
- Normal records (200-500 bytes): 30-80 records/page  
- Large records (1-5KB): 3-15 records/page
- Very large records (>5KB): <3 records/page

Formula: Records per page ≈ 15KB ÷ Average record size

2. 📈 Record Count vs Page Count

Tại sao số page có thể lớn hơn expected?

Factor 1: Record size quá lớn

-- Ví dụ: 16,000 records nhưng cần 1000+ pages

CREATE TABLE users_fat (
    id INT,                    -- 4 bytes
    bio TEXT,                  -- 2,000 bytes average  
    profile_data JSON,         -- 3,000 bytes average
    attachments LONGBLOB       -- 10,000 bytes average!
);
-- Average record: ~15KB → 1 record per page!
-- 16,000 records = 16,000 pages (không phải 16 pages!)

Factor 2: Variable length fields

-- Distribution không đồng đều:
- 70% users: bio ~500 bytes → 20 records/page
- 20% users: bio ~2,000 bytes → 5 records/page  
- 10% users: bio ~10,000 bytes → 1 record/page

Result: Same number records, very different page counts

Factor 3: Fragmentation từ DELETE

-- Ban đầu: 16,000 records = 16 pages (packed)
-- Sau DELETE 50%: 8,000 records vẫn spread across 16 pages
-- Page utilization: Giảm từ 100% xuống 50%
-- Wasted space: 8 pages worth của holes!

3. 🕳️ Fragmentation từ DELETE Operations

DELETE hoạt động như thế nào:

Hard DELETE process:

DELETE FROM users WHERE id = 1000;

🔍 Behind the scenes:
1. 📍 Locate record trong data page
2. 🏷️ Mark record slot as deleted (không xóa thật)
3. 🔗 Add slot vào page's free list  
4. 📊 Update page header: free_space += record_size
5. ✅ Record "invisible" nhưng space vẫn reserved

Page structure sau DELETE:

📄 Page layout (before DELETE):
┌─────────────────────────────────┐
│[Alice][Bob][Charlie][David][Eve]│ ← Tightly packed
│ Used: 15KB/16KB (94% full)     │
└─────────────────────────────────┘

📄 Page layout (after DELETE Charlie):
┌─────────────────────────────────┐  
│[Alice][Bob][_HOLE_][David][Eve] │ ← Charlie deleted = hole
│ Used: 12KB/16KB (75% full)     │ ← 25% wasted space!
│ Free list: → Charlie's slot    │
└─────────────────────────────────┘

Tại sao MySQL không compact tự động?

⚡ Trade-off: Speed vs Space efficiency

🚀 MySQL approach (Fast DELETE):
- Mark as deleted: 0.1ms
- No data movement: No additional I/O
- Continue với next operation immediately

🐌 Theoretical compaction:
- Mark as deleted: 0.1ms  
- Shift all records: 50ms per page
- Update all pointers: 200ms
- Lock page during operation: Blocks other queries
Total: 250ms+ (1,250x slower!)

MySQL chọn speed over perfect space utilization

Fragmentation accumulation:

📈 Progressive degradation:
Month 1: 95% page utilization (fresh data)
Month 6: 70% page utilization (some DELETEs)  
Month 12: 45% page utilization (heavy DELETE activity)

Performance impact:
- 2x more pages để scan cho same data
- 2x more I/O operations
- 2x more memory usage trong buffer pool
- Query performance giảm 50-100%

4. 🔄 INSERT Reuse Deleted Slots

INSERT allocation priority:

-- MySQL reuses deleted slots trước khi allocate new space
INSERT INTO users VALUES (20000, 'Alice');

🎯 Allocation strategy:
1. ✅ Check deleted slots first (reuse holes)
2. 🆕 Allocate từ free space (if no holes available)  
3. 📄 Create new page (if current page full)

Slot reuse process:

📄 Page với deleted slots:
┌─ Free List ─────────────────────┐
│ Head → Slot#4 → Slot#7 → Slot#9 │ ← Chain of deleted slots
├─ Page Layout ──────────────────┤
│ [ID:1][ID:2][ID:3][____][ID:5]  │ ← Slot#4 available
│ [ID:6][____][ID:8][____]        │ ← Slot#7,#9 available  
└─────────────────────────────────┘

INSERT process:
1. 🎯 Take first slot từ free list (Slot#4)
2. 📝 Write new record vào Slot#4  
3. 🔗 Update free list head → Slot#7

Size matching logic:

-- MySQL sử dụng "First Fit" algorithm
Available slots: [50 bytes][200 bytes][80 bytes]
New record: 90 bytes

Process: Use first slot ≥ 90 bytes (200-byte slot)
Waste: 110 bytes internal fragmentation
Trade-off: Speed over perfect space efficiency

Benefits và limitations:

✅ Reuse benefits:
- No new page allocation needed
- Better space utilization than never reusing
- Reduced storage growth rate

❌ Limitations:  
- Internal fragmentation (size mismatches)
- CPU overhead cho free list traversal
- Complex page management
- Permanent fragmentation nếu size mismatch severe

5. 🏠 Table Isolation & Page Ownership

One Page = One Table Rule:

MySQL InnoDB: Mỗi page chỉ thuộc về một table duy nhất

📄 Page header structure:
┌─────────────────────────────────┐
│ Table Space ID: 25              │ ← Identifies owner table
│ Page Type: INDEX_PAGE           │
│ Table ID: users_table           │ ← Belongs ONLY to users
└─────────────────────────────────┘

= Không bao giờ mix data từ multiple tables trong cùng page!

File organization:

💾 Database files:
/var/lib/mysql/mydb/
├── users.ibd       ← users table pages only
├── products.ibd    ← products table pages only  
├── orders.ibd      ← orders table pages only
└── ...

Each .ibd file:
- Page 0: File header
- Page 1-N: Table data exclusively
- No sharing với other tables

Tại sao không mix tables?

🔒 Problems if mixed (theoretical):
1. Lock conflicts: Table-level locks affect multiple tables
2. Buffer pool confusion: Cache management becomes complex  
3. Transaction isolation: Cross-table dependencies
4. Index navigation: B-Tree pointers cannot point to "part of page"
5. Backup complexity: Cannot backup individual tables cleanly

Benefits của isolation:

✅ Clean separation benefits:
- Predictable performance per table
- Easy space management & monitoring  
- Independent backup/restore operations
- No cross-table lock contention
- Simplified cache management

6. 📂 Tablespace Management

Tablespace là gì?

Tablespace = Container chứa database objects

Logical layer giữa database và physical files
Quản lý storage allocation và growth
Có thể chứa một hoặc nhiều tables

Types của tablespace:

System Tablespace (ibdata1):

-- Chứa system-wide data:
📁 ibdata1 contents:
├─ Data Dictionary (table definitions)
├─ Undo Logs (transaction rollback data)  
├─ Change Buffer (index optimization)
├─ Doublewrite Buffer (crash protection)
└─ System tables

Characteristics:
✅ Critical system data
❌ Cannot shrink once grown  
❌ Shared by all tables (legacy mode)

File-per-table Tablespace (.ibd):

-- Default since MySQL 5.6:
innodb_file_per_table = ON

📁 users.ibd structure:
├─ File header & metadata
├─ PRIMARY KEY B-Tree (clustered index)
├─ Secondary indexes B-Trees
├─ Data pages (actual records)
└─ Free space management

Benefits:
✅ Independent backup: cp users.ibd backup/
✅ Table-specific optimization: OPTIMIZE TABLE users
✅ Space reclaim: DROP TABLE frees space immediately  
✅ Easy monitoring: ls -la *.ibd shows per-table sizes

General Tablespace:

-- Custom grouping:
CREATE TABLESPACE company_data ADD DATAFILE 'company_data.ibd';
CREATE TABLE employees (...) TABLESPACE company_data;
CREATE TABLE departments (...) TABLESPACE company_data;

Use cases:
🎯 Logical grouping related tables
🎯 Shared free space optimization
🎯 Administrative control (backup groups)

7. 📄 Page Allocation Strategies

Allocation strategies:

Sequential Allocation (Ideal):

-- Fresh table growth:
📊 Page allocation: [0][1][2][3][4][5]... ← Sequential

Benefits:
✅ Sequential I/O (fastest disk access ~200MB/s)
✅ Predictable layout
✅ Efficient range scans  
✅ Minimal fragmentation

Extent-based Allocation:

-- Large tables (>1MB):
📊 Allocation unit: Extent = 64 consecutive pages = 1MB

Strategy:
- Small tables: Individual pages
- Medium tables: Full extents  
- Large tables: Multiple extents at once

Benefit: Reduces allocation overhead cho large tables

Fragmented Allocation (Problem):

-- After many operations:
📊 Page allocation: [0][15][3][28][7]... ← Random

Impact:  
❌ Random I/O (slow ~20MB/s vs 200MB/s sequential)
❌ Poor cache locality
❌ Inefficient range scans

Performance impact:

🚀 Sequential vs Random allocation:

Sequential layout:
- Table scan: ~5 seconds for 1M records
- Range queries: Efficient (few seeks)
- Memory: Good cache hit ratios

Random layout:  
- Table scan: ~50 seconds for 1M records (10x slower!)
- Range queries: Many random seeks
- Memory: Poor cache efficiency

8. ⚔️ Soft Delete vs Hard Delete

Comparison overview:

Aspect	Hard DELETE	Soft DELETE
Operation	Physical removal	Logical marking
Performance	Slower (all indexes affected)	Faster (minimal indexes affected)
Storage	Space freed (with fragmentation)	No space freed
Auditability	Data lost forever	Complete audit trail
Query complexity	Simple queries	Requires filtering

Hard DELETE chi tiết:

Process:

DELETE FROM users WHERE id = 1000;

🔍 Steps:
1. 📋 Locate record trong clustered index
2. 🗑️ Remove từ data page (mark as deleted)
3. 🌳 Update ALL indexes:
   - Remove từ PRIMARY KEY B-Tree
   - Remove từ all secondary indexes  
   - Update index statistics
4. 📊 Create fragmentation holes

Index impact:

-- Tất cả indexes bị affected:
PRIMARY KEY: Remove entry từ B-Tree ← 3-4 I/O ops
Email index: Remove entry từ B-Tree ← 2-3 I/O ops  
Age index: Remove entry từ B-Tree ← 2-3 I/O ops
... (mọi secondary index)

Total: 10-20 I/O operations per DELETE
Time: 50-100ms per operation

Soft DELETE chi tiết:

Process:

UPDATE users SET deleted_at = NOW() WHERE id = 1000;

🔍 Steps:
1. 📋 Locate record trong clustered index  
2. ✏️ Update deleted_at column value
3. 🌳 Update relevant indexes:
   - PRIMARY KEY: No change (id unchanged) 
   - deleted_at index: Update entry (NULL → timestamp)
   - Other indexes: Usually no change

Benefits:

✅ Performance: 10x faster operations
✅ Audit trail: Complete historical data
✅ Recovery: Can "undelete" records  
✅ Analytics: Historical trend analysis
✅ Compliance: Legal/regulatory requirements

Trade-offs:

❌ Storage: 2x space usage (keep deleted data)
❌ Query complexity: Must filter deleted_at IS NULL
❌ Index size: 2x larger indexes  
❌ Maintenance: Need periodic cleanup

Best practices:

-- Soft delete implementation:
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;
CREATE INDEX idx_active_users ON users(deleted_at, id) WHERE deleted_at IS NULL;

-- Efficient queries:
SELECT * FROM users WHERE deleted_at IS NULL; -- Active users
SELECT * FROM users WHERE deleted_at BETWEEN '2024-01-01' AND '2024-01-31'; -- Deleted trong tháng

-- Hybrid approach:  
-- Soft delete → Archive → Hard delete
UPDATE users SET deleted_at = NOW() WHERE id = 1000; -- Immediate
-- Later cleanup:
DELETE FROM users WHERE deleted_at < NOW() - INTERVAL 2 YEAR; -- Permanent removal

9. 🌳 B-Tree Index Maintenance

DELETE impact trên B-Tree:

Leaf page changes ONLY:

DELETE FROM users WHERE age = 20;

🌳 B-Tree structure:
- ROOT page: Không thay đổi
- BRANCH pages: Không thay đổi  
- LEAF pages: Mark entries as deleted ← ONLY this changes

= Tree structure maintained, no automatic rebuilding

Fragmentation trong index:

📋 Leaf page (before DELETE):
[age:19→id:500][age:20→id:1000][age:21→id:300] ← 3 entries

📋 Leaf page (after DELETE age:20):
[age:19→id:500][____HOLE____][age:21→id:300] ← 1 hole

Problem: Range scans phải scan qua holes!

Tại sao MySQL không auto-rebalance?

⚡ Performance vs Maintenance trade-off:

🚀 Current approach (No rebalancing):
- DELETE: 5-10ms (mark as deleted)
- Tree structure: Stable, predictable
- Concurrent access: Minimal locking

🐌 Theoretical rebalancing:
- DELETE: 50-200ms (rebuild affected nodes)  
- Tree operations: Complex, unpredictable
- Concurrent access: Heavy locking, blocking

MySQL prioritizes consistent fast operations

Performance degradation:

-- Range query example:
SELECT * FROM users WHERE age BETWEEN 18 AND 25;

Fresh index:
- Leaf pages to scan: 5 pages
- Records found: 10,000 records  
- Efficiency: 2,000 records per page
- Time: 25ms

Fragmented index (after many DELETEs):
- Leaf pages to scan: 5 pages (same!)
- Records found: 3,000 records
- Efficiency: 600 records per page  
- Time: 25ms (same I/O cost cho 1/3 data!)

= Performance per record degrades 3x over time

Manual maintenance:

Method 1: OPTIMIZE TABLE

OPTIMIZE TABLE users;

Process:
1. 🔄 Create new table structure
2. 📊 Copy active data only (skip deleted)
3. 🌳 Build fresh indexes (optimal layout)  
4. 🔄 Atomic replacement
5. 🗑️ Drop old fragmented structures

Benefits: Perfect optimization
Drawbacks: Table locked, downtime required

Method 2: Online Schema Change

# Zero-downtime approach:
pt-online-schema-change \
  --alter "ENGINE=InnoDB" \
  --execute D=mydb,t=users

Benefits:  
✅ No table locking
✅ Can run during business hours
✅ Throttles based on server load
✅ <1 second final switchover

When to rebuild:

-- Monitor fragmentation:
SELECT 
    table_name,
    ROUND(DATA_FREE/1024/1024, 2) as fragmented_mb,
    ROUND(DATA_FREE/(DATA_LENGTH+INDEX_LENGTH)*100, 2) as frag_pct
FROM INFORMATION_SCHEMA.TABLES 
WHERE table_name = 'users';

-- Rebuild triggers:
-- frag_pct > 25%: Consider rebuilding
-- frag_pct > 40%: Rebuild recommended  
-- frag_pct > 60%: Rebuild urgent

10. 🚀 Performance Optimization

Monitoring strategies:

Fragmentation monitoring:

-- Weekly fragmentation check:
CREATE VIEW fragmentation_report AS
SELECT 
    table_name,
    table_rows,
    ROUND(data_length/1024/1024, 2) as data_mb,
    ROUND(index_length/1024/1024, 2) as index_mb,  
    ROUND(data_free/1024/1024, 2) as fragmented_mb,
    ROUND(data_free/(data_length+index_length)*100, 2) as frag_pct,
    CASE 
        WHEN data_free/(data_length+index_length)*100 > 30 THEN 'CRITICAL'
        WHEN data_free/(data_length+index_length)*100 > 15 THEN 'WARNING'
        ELSE 'OK'
    END as status
FROM information_schema.tables
WHERE engine = 'InnoDB'
ORDER BY frag_pct DESC;

Optimization techniques:

Proactive design:

-- ✅ Minimize fragmentation from start:

1. 📊 Consistent record sizes:
CREATE TABLE products (
    id BIGINT PRIMARY KEY,
    name VARCHAR(100),        -- Fixed reasonable size
    price DECIMAL(10,2)       -- Fixed size
    -- Avoid: TEXT, LONGBLOB trong transactional tables
);

2. 🔄 Batch operations:
-- Instead of: DELETE FROM logs WHERE date < '2023-01-01'; (millions at once)
-- Use: DELETE FROM logs WHERE date < '2023-01-01' LIMIT 10000; (repeat)

3. 📅 Partitioning:
CREATE TABLE orders (
    id BIGINT,
    order_date DATE
) PARTITION BY RANGE (YEAR(order_date));
-- Drop old partitions: ALTER TABLE orders DROP PARTITION p2022;

Maintenance scheduling:

-- Monthly maintenance routine:

1. 📊 Check fragmentation:
SELECT * FROM fragmentation_report WHERE status != 'OK';

2. 🔧 Rebuild fragmented tables:  
-- Low traffic hours:
OPTIMIZE TABLE high_fragmentation_table;

-- Business hours:  
pt-online-schema-change --alter "ENGINE=InnoDB" --execute D=db,t=table;

3. 📈 Monitor performance improvement:
-- Compare query times before/after
-- Verify fragmentation reduction
-- Check storage space reclaimed

Best practices summary:

Design principles:

✅ DO:
- Monitor fragmentation regularly (monthly)
- Use consistent record sizes when possible
- Implement soft delete cho audit-heavy tables
- Batch large DELETE operations  
- Use partitioning cho time-series data
- Schedule regular OPTIMIZE TABLE
- Plan maintenance windows

❌ DON'T:  
- Ignore gradual performance degradation
- Mix hot và cold data trong same table
- Store large BLOBs trong transactional tables
- Perform massive DELETEs without batching
- Rebuild indexes too frequently (overhead)
- Forget to monitor buffer pool efficiency

Performance monitoring:

-- Key metrics to track:
1. 📊 Fragmentation percentage (target: <25%)
2. ⚡ Query response time trends  
3. 💾 Buffer pool hit ratio (target: >95%)
4. 📈 I/O operations per query
5. 💽 Storage growth rate vs data growth rate

🎯 Key Takeaways

Core concepts:

📄 Pages: 16KB units, building blocks của storage
🕳️ Fragmentation: Inevitable với DELETE operations, degrades performance over time
🔄 Slot reuse: INSERT reuses deleted space, nhưng not perfectly efficient
🏠 Table isolation: One page = one table, clean separation
📂 Tablespace: Logical containers, different types cho different purposes
🌳 B-Tree maintenance: Requires manual intervention, no auto-rebalancing

Performance principles:

⚡ Speed vs Space: MySQL prioritizes operation speed over perfect space efficiency
📊 Monitoring: Proactive monitoring essential cho performance maintenance  
🔧 Maintenance: Regular OPTIMIZE TABLE necessary cho heavily modified tables
🎯 Design: Smart schema design prevents many performance issues

Practical guidelines:

✅ Monitor fragmentation monthly
✅ Rebuild indexes khi fragmentation >25-30%
✅ Use appropriate tools (OPTIMIZE vs online schema change)  
✅ Design consistent record sizes
✅ Batch large operations
✅ Consider partitioning cho time-based data
✅ Plan maintenance windows

💡 Final Insights

Database storage management là balance giữa performance và efficiency.

Từ hiểu cách MySQL lưu trữ data ở mức page, đến quản lý fragmentation và optimization - tất cả đều ảnh hưởng đến user experience và system scalability.

Remember:

✅ Fragmentation là natural consequence của DELETE operations
✅ Performance degradation happens gradually, easy to miss
✅ Proactive monitoring prevents performance surprises
✅ Right maintenance strategy depends on workload patterns
✅ Storage optimization requires ongoing attention

"Prevention is better than cure - design smart, monitor actively, maintain regularly!"

nestjs backend