Database Storage & Fragmentation Complete
📚 Tổng quan
Chúng ta đã khám phá sâu về cách MySQL lưu trữ và quản lý data - từ page structure cơ bản, đến fragmentation issues, và strategies để duy trì performance tối ưu. Đây là kiến thức thực tế giúp hiểu tại sao database performance thay đổi theo thời gian và cách khắc phục.
📋 Mục lục
- Page Structure & Data Storage
- Record Count vs Page Count
- Fragmentation từ DELETE Operations
- INSERT Reuse Deleted Slots
- Table Isolation & Page Ownership
- Tablespace Management
- Page Allocation Strategies
- Soft Delete vs Hard Delete
- B-Tree Index Maintenance
- Performance Optimization
1. 📄 Page Structure & Data Storage
Khái niệm cơ bản:
Page = Đơn vị lưu trữ cơ bản của MySQL
- Kích thước: 16KB mỗi page (cố định)
- Chứa: User records + metadata + free space
- I/O unit: MySQL luôn đọc/ghi theo đơn vị page
Cách MySQL load data:
-- Query: SELECT * FROM users WHERE id = 1000;
🔍 Quá trình execution:
1. 📖 Determine pages cần đọc (từ index hoặc full scan)
2. 💾 Load pages từ disk vào buffer pool (memory)
3. 🔍 Scan records trong pages
4. ✅ Return matching records
Không có index: Phải đọc TẤT CẢ pages (full table scan)
Có index: Chỉ đọc vài pages cần thiết
Page capacity:
📊 Typical page utilization:
- Compact records (50-100 bytes): 200-300 records/page
- Normal records (200-500 bytes): 30-80 records/page
- Large records (1-5KB): 3-15 records/page
- Very large records (>5KB): <3 records/page
Formula: Records per page ≈ 15KB ÷ Average record size
2. 📈 Record Count vs Page Count
Tại sao số page có thể lớn hơn expected?
Factor 1: Record size quá lớn
-- Ví dụ: 16,000 records nhưng cần 1000+ pages
CREATE TABLE users_fat (
id INT, -- 4 bytes
bio TEXT, -- 2,000 bytes average
profile_data JSON, -- 3,000 bytes average
attachments LONGBLOB -- 10,000 bytes average!
);
-- Average record: ~15KB → 1 record per page!
-- 16,000 records = 16,000 pages (không phải 16 pages!)
Factor 2: Variable length fields
-- Distribution không đồng đều:
- 70% users: bio ~500 bytes → 20 records/page
- 20% users: bio ~2,000 bytes → 5 records/page
- 10% users: bio ~10,000 bytes → 1 record/page
Result: Same number records, very different page counts
Factor 3: Fragmentation từ DELETE
-- Ban đầu: 16,000 records = 16 pages (packed)
-- Sau DELETE 50%: 8,000 records vẫn spread across 16 pages
-- Page utilization: Giảm từ 100% xuống 50%
-- Wasted space: 8 pages worth của holes!
3. 🕳️ Fragmentation từ DELETE Operations
DELETE hoạt động như thế nào:
Hard DELETE process:
DELETE FROM users WHERE id = 1000;
🔍 Behind the scenes:
1. 📍 Locate record trong data page
2. 🏷️ Mark record slot as deleted (không xóa thật)
3. 🔗 Add slot vào page's free list
4. 📊 Update page header: free_space += record_size
5. ✅ Record "invisible" nhưng space vẫn reserved
Page structure sau DELETE:
📄 Page layout (before DELETE):
┌─────────────────────────────────┐
│[Alice][Bob][Charlie][David][Eve]│ ← Tightly packed
│ Used: 15KB/16KB (94% full) │
└─────────────────────────────────┘
📄 Page layout (after DELETE Charlie):
┌─────────────────────────────────┐
│[Alice][Bob][_HOLE_][David][Eve] │ ← Charlie deleted = hole
│ Used: 12KB/16KB (75% full) │ ← 25% wasted space!
│ Free list: → Charlie's slot │
└─────────────────────────────────┘
Tại sao MySQL không compact tự động?
⚡ Trade-off: Speed vs Space efficiency
🚀 MySQL approach (Fast DELETE):
- Mark as deleted: 0.1ms
- No data movement: No additional I/O
- Continue với next operation immediately
🐌 Theoretical compaction:
- Mark as deleted: 0.1ms
- Shift all records: 50ms per page
- Update all pointers: 200ms
- Lock page during operation: Blocks other queries
Total: 250ms+ (1,250x slower!)
MySQL chọn speed over perfect space utilization
Fragmentation accumulation:
📈 Progressive degradation:
Month 1: 95% page utilization (fresh data)
Month 6: 70% page utilization (some DELETEs)
Month 12: 45% page utilization (heavy DELETE activity)
Performance impact:
- 2x more pages để scan cho same data
- 2x more I/O operations
- 2x more memory usage trong buffer pool
- Query performance giảm 50-100%
4. 🔄 INSERT Reuse Deleted Slots
INSERT allocation priority:
-- MySQL reuses deleted slots trước khi allocate new space
INSERT INTO users VALUES (20000, 'Alice');
🎯 Allocation strategy:
1. ✅ Check deleted slots first (reuse holes)
2. 🆕 Allocate từ free space (if no holes available)
3. 📄 Create new page (if current page full)
Slot reuse process:
📄 Page với deleted slots:
┌─ Free List ─────────────────────┐
│ Head → Slot#4 → Slot#7 → Slot#9 │ ← Chain of deleted slots
├─ Page Layout ──────────────────┤
│ [ID:1][ID:2][ID:3][____][ID:5] │ ← Slot#4 available
│ [ID:6][____][ID:8][____] │ ← Slot#7,#9 available
└─────────────────────────────────┘
INSERT process:
1. 🎯 Take first slot từ free list (Slot#4)
2. 📝 Write new record vào Slot#4
3. 🔗 Update free list head → Slot#7
Size matching logic:
-- MySQL sử dụng "First Fit" algorithm
Available slots: [50 bytes][200 bytes][80 bytes]
New record: 90 bytes
Process: Use first slot ≥ 90 bytes (200-byte slot)
Waste: 110 bytes internal fragmentation
Trade-off: Speed over perfect space efficiency
Benefits và limitations:
✅ Reuse benefits:
- No new page allocation needed
- Better space utilization than never reusing
- Reduced storage growth rate
❌ Limitations:
- Internal fragmentation (size mismatches)
- CPU overhead cho free list traversal
- Complex page management
- Permanent fragmentation nếu size mismatch severe
5. 🏠 Table Isolation & Page Ownership
One Page = One Table Rule:
MySQL InnoDB: Mỗi page chỉ thuộc về một table duy nhất
📄 Page header structure:
┌─────────────────────────────────┐
│ Table Space ID: 25 │ ← Identifies owner table
│ Page Type: INDEX_PAGE │
│ Table ID: users_table │ ← Belongs ONLY to users
└─────────────────────────────────┘
= Không bao giờ mix data từ multiple tables trong cùng page!
File organization:
💾 Database files:
/var/lib/mysql/mydb/
├── users.ibd ← users table pages only
├── products.ibd ← products table pages only
├── orders.ibd ← orders table pages only
└── ...
Each .ibd file:
- Page 0: File header
- Page 1-N: Table data exclusively
- No sharing với other tables
Tại sao không mix tables?
🔒 Problems if mixed (theoretical):
1. Lock conflicts: Table-level locks affect multiple tables
2. Buffer pool confusion: Cache management becomes complex
3. Transaction isolation: Cross-table dependencies
4. Index navigation: B-Tree pointers cannot point to "part of page"
5. Backup complexity: Cannot backup individual tables cleanly
Benefits của isolation:
✅ Clean separation benefits:
- Predictable performance per table
- Easy space management & monitoring
- Independent backup/restore operations
- No cross-table lock contention
- Simplified cache management
6. 📂 Tablespace Management
Tablespace là gì?
Tablespace = Container chứa database objects
- Logical layer giữa database và physical files
- Quản lý storage allocation và growth
- Có thể chứa một hoặc nhiều tables
Types của tablespace:
System Tablespace (ibdata1):
-- Chứa system-wide data:
📁 ibdata1 contents:
├─ Data Dictionary (table definitions)
├─ Undo Logs (transaction rollback data)
├─ Change Buffer (index optimization)
├─ Doublewrite Buffer (crash protection)
└─ System tables
Characteristics:
✅ Critical system data
❌ Cannot shrink once grown
❌ Shared by all tables (legacy mode)
File-per-table Tablespace (.ibd):
-- Default since MySQL 5.6:
innodb_file_per_table = ON
📁 users.ibd structure:
├─ File header & metadata
├─ PRIMARY KEY B-Tree (clustered index)
├─ Secondary indexes B-Trees
├─ Data pages (actual records)
└─ Free space management
Benefits:
✅ Independent backup: cp users.ibd backup/
✅ Table-specific optimization: OPTIMIZE TABLE users
✅ Space reclaim: DROP TABLE frees space immediately
✅ Easy monitoring: ls -la *.ibd shows per-table sizes
General Tablespace:
-- Custom grouping:
CREATE TABLESPACE company_data ADD DATAFILE 'company_data.ibd';
CREATE TABLE employees (...) TABLESPACE company_data;
CREATE TABLE departments (...) TABLESPACE company_data;
Use cases:
🎯 Logical grouping related tables
🎯 Shared free space optimization
🎯 Administrative control (backup groups)
7. 📄 Page Allocation Strategies
Allocation strategies:
Sequential Allocation (Ideal):
-- Fresh table growth:
📊 Page allocation: [0][1][2][3][4][5]... ← Sequential
Benefits:
✅ Sequential I/O (fastest disk access ~200MB/s)
✅ Predictable layout
✅ Efficient range scans
✅ Minimal fragmentation
Extent-based Allocation:
-- Large tables (>1MB):
📊 Allocation unit: Extent = 64 consecutive pages = 1MB
Strategy:
- Small tables: Individual pages
- Medium tables: Full extents
- Large tables: Multiple extents at once
Benefit: Reduces allocation overhead cho large tables
Fragmented Allocation (Problem):
-- After many operations:
📊 Page allocation: [0][15][3][28][7]... ← Random
Impact:
❌ Random I/O (slow ~20MB/s vs 200MB/s sequential)
❌ Poor cache locality
❌ Inefficient range scans
Performance impact:
🚀 Sequential vs Random allocation:
Sequential layout:
- Table scan: ~5 seconds for 1M records
- Range queries: Efficient (few seeks)
- Memory: Good cache hit ratios
Random layout:
- Table scan: ~50 seconds for 1M records (10x slower!)
- Range queries: Many random seeks
- Memory: Poor cache efficiency
8. ⚔️ Soft Delete vs Hard Delete
Comparison overview:
Aspect | Hard DELETE | Soft DELETE |
---|---|---|
Operation | Physical removal | Logical marking |
Performance | Slower (all indexes affected) | Faster (minimal indexes affected) |
Storage | Space freed (with fragmentation) | No space freed |
Auditability | Data lost forever | Complete audit trail |
Query complexity | Simple queries | Requires filtering |
Hard DELETE chi tiết:
Process:
DELETE FROM users WHERE id = 1000;
🔍 Steps:
1. 📋 Locate record trong clustered index
2. 🗑️ Remove từ data page (mark as deleted)
3. 🌳 Update ALL indexes:
- Remove từ PRIMARY KEY B-Tree
- Remove từ all secondary indexes
- Update index statistics
4. 📊 Create fragmentation holes
Index impact:
-- Tất cả indexes bị affected:
PRIMARY KEY: Remove entry từ B-Tree ← 3-4 I/O ops
Email index: Remove entry từ B-Tree ← 2-3 I/O ops
Age index: Remove entry từ B-Tree ← 2-3 I/O ops
... (mọi secondary index)
Total: 10-20 I/O operations per DELETE
Time: 50-100ms per operation
Soft DELETE chi tiết:
Process:
UPDATE users SET deleted_at = NOW() WHERE id = 1000;
🔍 Steps:
1. 📋 Locate record trong clustered index
2. ✏️ Update deleted_at column value
3. 🌳 Update relevant indexes:
- PRIMARY KEY: No change (id unchanged)
- deleted_at index: Update entry (NULL → timestamp)
- Other indexes: Usually no change
Benefits:
✅ Performance: 10x faster operations
✅ Audit trail: Complete historical data
✅ Recovery: Can "undelete" records
✅ Analytics: Historical trend analysis
✅ Compliance: Legal/regulatory requirements
Trade-offs:
❌ Storage: 2x space usage (keep deleted data)
❌ Query complexity: Must filter deleted_at IS NULL
❌ Index size: 2x larger indexes
❌ Maintenance: Need periodic cleanup
Best practices:
-- Soft delete implementation:
ALTER TABLE users ADD COLUMN deleted_at TIMESTAMP NULL;
CREATE INDEX idx_active_users ON users(deleted_at, id) WHERE deleted_at IS NULL;
-- Efficient queries:
SELECT * FROM users WHERE deleted_at IS NULL; -- Active users
SELECT * FROM users WHERE deleted_at BETWEEN '2024-01-01' AND '2024-01-31'; -- Deleted trong tháng
-- Hybrid approach:
-- Soft delete → Archive → Hard delete
UPDATE users SET deleted_at = NOW() WHERE id = 1000; -- Immediate
-- Later cleanup:
DELETE FROM users WHERE deleted_at < NOW() - INTERVAL 2 YEAR; -- Permanent removal
9. 🌳 B-Tree Index Maintenance
DELETE impact trên B-Tree:
Leaf page changes ONLY:
DELETE FROM users WHERE age = 20;
🌳 B-Tree structure:
- ROOT page: Không thay đổi
- BRANCH pages: Không thay đổi
- LEAF pages: Mark entries as deleted ← ONLY this changes
= Tree structure maintained, no automatic rebuilding
Fragmentation trong index:
📋 Leaf page (before DELETE):
[age:19→id:500][age:20→id:1000][age:21→id:300] ← 3 entries
📋 Leaf page (after DELETE age:20):
[age:19→id:500][____HOLE____][age:21→id:300] ← 1 hole
Problem: Range scans phải scan qua holes!
Tại sao MySQL không auto-rebalance?
⚡ Performance vs Maintenance trade-off:
🚀 Current approach (No rebalancing):
- DELETE: 5-10ms (mark as deleted)
- Tree structure: Stable, predictable
- Concurrent access: Minimal locking
🐌 Theoretical rebalancing:
- DELETE: 50-200ms (rebuild affected nodes)
- Tree operations: Complex, unpredictable
- Concurrent access: Heavy locking, blocking
MySQL prioritizes consistent fast operations
Performance degradation:
-- Range query example:
SELECT * FROM users WHERE age BETWEEN 18 AND 25;
Fresh index:
- Leaf pages to scan: 5 pages
- Records found: 10,000 records
- Efficiency: 2,000 records per page
- Time: 25ms
Fragmented index (after many DELETEs):
- Leaf pages to scan: 5 pages (same!)
- Records found: 3,000 records
- Efficiency: 600 records per page
- Time: 25ms (same I/O cost cho 1/3 data!)
= Performance per record degrades 3x over time
Manual maintenance:
Method 1: OPTIMIZE TABLE
OPTIMIZE TABLE users;
Process:
1. 🔄 Create new table structure
2. 📊 Copy active data only (skip deleted)
3. 🌳 Build fresh indexes (optimal layout)
4. 🔄 Atomic replacement
5. 🗑️ Drop old fragmented structures
Benefits: Perfect optimization
Drawbacks: Table locked, downtime required
Method 2: Online Schema Change
# Zero-downtime approach:
pt-online-schema-change \
--alter "ENGINE=InnoDB" \
--execute D=mydb,t=users
Benefits:
✅ No table locking
✅ Can run during business hours
✅ Throttles based on server load
✅ <1 second final switchover
When to rebuild:
-- Monitor fragmentation:
SELECT
table_name,
ROUND(DATA_FREE/1024/1024, 2) as fragmented_mb,
ROUND(DATA_FREE/(DATA_LENGTH+INDEX_LENGTH)*100, 2) as frag_pct
FROM INFORMATION_SCHEMA.TABLES
WHERE table_name = 'users';
-- Rebuild triggers:
-- frag_pct > 25%: Consider rebuilding
-- frag_pct > 40%: Rebuild recommended
-- frag_pct > 60%: Rebuild urgent
10. 🚀 Performance Optimization
Monitoring strategies:
Fragmentation monitoring:
-- Weekly fragmentation check:
CREATE VIEW fragmentation_report AS
SELECT
table_name,
table_rows,
ROUND(data_length/1024/1024, 2) as data_mb,
ROUND(index_length/1024/1024, 2) as index_mb,
ROUND(data_free/1024/1024, 2) as fragmented_mb,
ROUND(data_free/(data_length+index_length)*100, 2) as frag_pct,
CASE
WHEN data_free/(data_length+index_length)*100 > 30 THEN 'CRITICAL'
WHEN data_free/(data_length+index_length)*100 > 15 THEN 'WARNING'
ELSE 'OK'
END as status
FROM information_schema.tables
WHERE engine = 'InnoDB'
ORDER BY frag_pct DESC;
Optimization techniques:
Proactive design:
-- ✅ Minimize fragmentation from start:
1. 📊 Consistent record sizes:
CREATE TABLE products (
id BIGINT PRIMARY KEY,
name VARCHAR(100), -- Fixed reasonable size
price DECIMAL(10,2) -- Fixed size
-- Avoid: TEXT, LONGBLOB trong transactional tables
);
2. 🔄 Batch operations:
-- Instead of: DELETE FROM logs WHERE date < '2023-01-01'; (millions at once)
-- Use: DELETE FROM logs WHERE date < '2023-01-01' LIMIT 10000; (repeat)
3. 📅 Partitioning:
CREATE TABLE orders (
id BIGINT,
order_date DATE
) PARTITION BY RANGE (YEAR(order_date));
-- Drop old partitions: ALTER TABLE orders DROP PARTITION p2022;
Maintenance scheduling:
-- Monthly maintenance routine:
1. 📊 Check fragmentation:
SELECT * FROM fragmentation_report WHERE status != 'OK';
2. 🔧 Rebuild fragmented tables:
-- Low traffic hours:
OPTIMIZE TABLE high_fragmentation_table;
-- Business hours:
pt-online-schema-change --alter "ENGINE=InnoDB" --execute D=db,t=table;
3. 📈 Monitor performance improvement:
-- Compare query times before/after
-- Verify fragmentation reduction
-- Check storage space reclaimed
Best practices summary:
Design principles:
✅ DO:
- Monitor fragmentation regularly (monthly)
- Use consistent record sizes when possible
- Implement soft delete cho audit-heavy tables
- Batch large DELETE operations
- Use partitioning cho time-series data
- Schedule regular OPTIMIZE TABLE
- Plan maintenance windows
❌ DON'T:
- Ignore gradual performance degradation
- Mix hot và cold data trong same table
- Store large BLOBs trong transactional tables
- Perform massive DELETEs without batching
- Rebuild indexes too frequently (overhead)
- Forget to monitor buffer pool efficiency
Performance monitoring:
-- Key metrics to track:
1. 📊 Fragmentation percentage (target: <25%)
2. ⚡ Query response time trends
3. 💾 Buffer pool hit ratio (target: >95%)
4. 📈 I/O operations per query
5. 💽 Storage growth rate vs data growth rate
🎯 Key Takeaways
Core concepts:
📄 Pages: 16KB units, building blocks của storage
🕳️ Fragmentation: Inevitable với DELETE operations, degrades performance over time
🔄 Slot reuse: INSERT reuses deleted space, nhưng not perfectly efficient
🏠 Table isolation: One page = one table, clean separation
📂 Tablespace: Logical containers, different types cho different purposes
🌳 B-Tree maintenance: Requires manual intervention, no auto-rebalancing
Performance principles:
⚡ Speed vs Space: MySQL prioritizes operation speed over perfect space efficiency
📊 Monitoring: Proactive monitoring essential cho performance maintenance
🔧 Maintenance: Regular OPTIMIZE TABLE necessary cho heavily modified tables
🎯 Design: Smart schema design prevents many performance issues
Practical guidelines:
✅ Monitor fragmentation monthly
✅ Rebuild indexes khi fragmentation >25-30%
✅ Use appropriate tools (OPTIMIZE vs online schema change)
✅ Design consistent record sizes
✅ Batch large operations
✅ Consider partitioning cho time-based data
✅ Plan maintenance windows
💡 Final Insights
Database storage management là balance giữa performance và efficiency.
Từ hiểu cách MySQL lưu trữ data ở mức page, đến quản lý fragmentation và optimization - tất cả đều ảnh hưởng đến user experience và system scalability.
Remember:
- ✅ Fragmentation là natural consequence của DELETE operations
- ✅ Performance degradation happens gradually, easy to miss
- ✅ Proactive monitoring prevents performance surprises
- ✅ Right maintenance strategy depends on workload patterns
- ✅ Storage optimization requires ongoing attention
"Prevention is better than cure - design smart, monitor actively, maintain regularly!"
All rights reserved