Database Performance và Query Optimization - Hệ thống không sử dụng index khi filter
📋 Mục Lục
- Type Conversion và Implicit Casting
- Index Performance với LIKE Operations
- Best Practices và Performance Tips
- Monitoring và Troubleshooting
1. Type Conversion và Implicit Casting
Vấn Đề: VARCHAR Column với Numeric Comparison
Scenario:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
salary VARCHAR(50) -- ❌ Salary stored as VARCHAR!
);
-- Query với numeric comparison:
SELECT * FROM employees WHERE salary = 4000;
-- ✅ Sẽ work, nhưng có performance issues!
MySQL Conversion Rules:
String-to-Number Conversion:
-- MySQL converts VARCHAR to NUMBER cho comparison
SELECT
'4000' = 4000, -- Returns: 1 (TRUE)
'4000.00' = 4000, -- Returns: 1 (TRUE)
'04000' = 4000, -- Returns: 1 (TRUE)
'4000.50' = 4000, -- Returns: 0 (FALSE)
'N/A' = 4000, -- Returns: 0 (FALSE)
Conversion Process:
-- MySQL converts string từ left to right until non-numeric character
SELECT
CAST('4000' AS UNSIGNED) as result1, -- 4000
CAST('4000.00' AS UNSIGNED) as result2, -- 4000
CAST('04000' AS UNSIGNED) as result3, -- 4000
CAST('4000.50' AS UNSIGNED) as result4, -- 4000 (truncates decimal)
CAST('N/A' AS UNSIGNED) as result5, -- 0 (non-numeric)
Performance Issues với Type Conversion:
❌ Problems:
-- BAD: Không thể sử dụng index hiệu quả!
SELECT * FROM employees WHERE salary = 4000;
-- MySQL phải convert TOÀN BỘ column salary → Very slow!
-- Index scan không hiệu quả:
CREATE INDEX idx_salary ON employees(salary);
EXPLAIN SELECT * FROM employees WHERE salary = 4000;
-- Result: type = ALL (Full table scan dù có index!)
✅ Solutions:
A) Fix Data Type:
-- Tốt nhất: Sửa data type
ALTER TABLE employees ADD COLUMN salary_numeric DECIMAL(10,2);
UPDATE employees SET salary_numeric = CAST(salary AS DECIMAL(10,2))
WHERE salary REGEXP '^[0-9]+(\.[0-9]+)?$';
-- Drop old column, rename new column
ALTER TABLE employees DROP COLUMN salary;
ALTER TABLE employees CHANGE salary_numeric salary DECIMAL(10,2);
B) Proper Comparison:
-- So sánh với string để use index
SELECT * FROM employees WHERE salary = '4000';
-- Hoặc convert explicit với validation
SELECT * FROM employees
WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00
AND salary REGEXP '^[0-9]+(\.[0-9]+)?$'; -- Ensure numeric
C) Functional Index (MySQL 8.0+):
-- Tạo functional index cho performance
CREATE INDEX idx_salary_numeric ON employees ((CAST(salary AS DECIMAL(10,2))));
-- Query sẽ nhanh hơn
SELECT * FROM employees WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00;
Real-World Example:
-- Legacy system problem
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(255),
price VARCHAR(50) -- 😱 Price stored as VARCHAR!
);
-- Data messy như này:
INSERT INTO products VALUES
(1, 'Laptop', '$1,299.99'),
(2, 'Mouse', '25.50'),
(3, 'Keyboard', '89'),
(4, 'Monitor', 'Call for price');
-- Query này sẽ có unexpected results:
SELECT * FROM products WHERE price > 100;
-- Conversions: '$1,299.99' → 0, '25.50' → 25.50, '89' → 89, 'Call for price' → 0
2. Index Performance với LIKE Operations
Vấn Đề: Leading Wildcard Performance
Scenario:
CREATE TABLE users (
id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100)
);
-- Tạo index trên username
CREATE INDEX idx_username ON users(username);
-- Query với leading wildcard:
SELECT * FROM users WHERE username LIKE '%abc%';
-- ❌ KHÔNG THỂ sử dụng index hiệu quả!
LIKE Patterns và Index Usage:
✅ Index-Friendly Patterns:
-- Exact match - FASTEST
SELECT * FROM users WHERE username = 'john_doe';
-- Plan: INDEX SEEK/REF
-- Trailing wildcard - FAST
SELECT * FROM users WHERE username LIKE 'john%';
-- Plan: INDEX RANGE SCAN
❌ Index-Unfriendly Patterns:
-- Leading wildcard - SLOW
SELECT * FROM users WHERE username LIKE '%john';
-- Plan: FULL TABLE SCAN
-- Both sides wildcard - SLOWEST
SELECT * FROM users WHERE username LIKE '%john%';
-- Plan: FULL TABLE SCAN
Execution Plan Comparison:
Trailing Wildcard (Good):
EXPLAIN SELECT * FROM users WHERE username LIKE 'abc%';
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | idx_username | idx_username | 203 | NULL | 3 | Using where |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
Leading Wildcard (Bad):
EXPLAIN SELECT * FROM users WHERE username LIKE '%abc%';
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 10007 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
Tại Sao Leading Wildcard Không Dùng Được Index?
B-Tree Index Structure:
B-Tree Index on username (sorted):
├── 'abc_user'
├── 'abcdef_test'
├── 'jane_smith'
├── 'john_doe'
├── 'test_abc_end'
├── 'user_000001'
├── 'user_000002'
└── 'user_abc'
Index Access Patterns:
-- ✅ 'abc%' → Start from 'abc', scan until 'abd'
-- Efficient range scan trong sorted index
-- ❌ '%abc%' → Phải scan TOÀN BỘ index để check mỗi value!
-- Không có pattern để optimize
Performance Impact:
Test Results:
SET profiling = 1;
-- Leading wildcard (slow)
SELECT COUNT(*) FROM users WHERE username LIKE '%abc%'; -- 0.012543s
-- Trailing wildcard (fast)
SELECT COUNT(*) FROM users WHERE username LIKE 'user%'; -- 0.000834s
-- Exact match (fastest)
SELECT COUNT(*) FROM users WHERE username = 'john_doe'; -- 0.000234s
SHOW PROFILES;
Performance Difference: 53x slower với leading wildcard!
Solutions cho Text Search:
1. Full-Text Search (Recommended):
-- Tạo FULLTEXT index
ALTER TABLE users ADD FULLTEXT(username);
-- Sử dụng MATCH...AGAINST
SELECT * FROM users
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);
-- Performance với FULLTEXT:
EXPLAIN SELECT * FROM users
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);
-- Uses fulltext index instead of table scan
2. Reverse Index Pattern:
-- Thêm column để store reverse string
ALTER TABLE users ADD COLUMN username_reverse VARCHAR(50);
-- Update với reverse values
UPDATE users SET username_reverse = REVERSE(username);
-- Tạo index trên reverse column
CREATE INDEX idx_username_reverse ON users(username_reverse);
-- Tìm strings kết thúc bằng 'abc': '%abc'
SELECT * FROM users
WHERE username_reverse LIKE CONCAT(REVERSE('abc'), '%')
AND username LIKE '%abc'; -- Double-check cho accuracy
3. N-Gram Index (MySQL 8.0+):
-- Tạo trigram index với ngram parser
CREATE FULLTEXT INDEX idx_username_ngram ON users(username)
WITH PARSER ngram;
-- Query với minimum word length
SELECT * FROM users
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);
Index Usage Summary:
Pattern | Index Usage | Performance | Use Case |
---|---|---|---|
username = 'john' |
✅ INDEX SEEK | Fastest | Exact match |
username LIKE 'john%' |
✅ RANGE SCAN | Fast | Prefix search |
username LIKE '%john' |
❌ TABLE SCAN | Slow | Suffix search |
username LIKE '%john%' |
❌ TABLE SCAN | Slowest | Contains search |
MATCH...AGAINST('john') |
✅ FULLTEXT | Fast | Text search |
3. Best Practices và Performance Tips
1. Data Type Best Practices:
✅ DO's:
-- Use appropriate data types
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(255),
price DECIMAL(10,2), -- ✅ Numeric type cho prices
created_at TIMESTAMP, -- ✅ Proper date type
is_active BOOLEAN -- ✅ Boolean cho flags
);
-- Use proper constraints
ALTER TABLE products
ADD CONSTRAINT chk_price CHECK (price >= 0),
ADD CONSTRAINT chk_name CHECK (LENGTH(name) > 0);
❌ DON'Ts:
-- Avoid storing numbers as strings
CREATE TABLE bad_products (
id VARCHAR(50), -- ❌ ID should be INT
price VARCHAR(20), -- ❌ Price should be DECIMAL
created_at VARCHAR(30), -- ❌ Date should be TIMESTAMP
is_active VARCHAR(10) -- ❌ Boolean should be BOOLEAN
);
2. Index Design Best Practices:
✅ Effective Index Strategies:
-- Single-column indexes cho frequent WHERE clauses
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_date ON orders(order_date);
-- Composite indexes cho multiple-column queries
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);
-- Covering indexes để avoid table lookups
CREATE INDEX idx_order_summary ON orders(customer_id, order_date, total_amount);
-- Partial indexes cho large tables với predictable patterns
CREATE INDEX idx_active_users ON users(username) WHERE is_active = true;
❌ Index Anti-patterns:
-- Too many single-column indexes (maintenance overhead)
CREATE INDEX idx1 ON table(col1);
CREATE INDEX idx2 ON table(col2);
CREATE INDEX idx3 ON table(col3); -- Consider composite instead
-- Indexes on low-cardinality columns
CREATE INDEX idx_gender ON users(gender); -- Only 2-3 values, not selective
-- Redundant indexes
CREATE INDEX idx_a ON table(a);
CREATE INDEX idx_ab ON table(a,b); -- idx_a is redundant if idx_ab covers queries
3. Query Optimization Patterns:
✅ Efficient Query Patterns:
-- Use specific columns instead of SELECT *
SELECT id, name, price FROM products WHERE category_id = 10;
-- Use LIMIT cho large result sets
SELECT * FROM products ORDER BY created_at DESC LIMIT 20;
-- Use EXISTS instead of IN cho large subqueries
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Use proper JOIN syntax
SELECT c.name, COUNT(o.id)
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
❌ Inefficient Query Patterns:
-- SELECT * cho large tables
SELECT * FROM large_table; -- Transfers unnecessary data
-- Functions trong WHERE clause
SELECT * FROM orders WHERE YEAR(order_date) = 2024; -- Can't use index
-- Better: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'
-- Leading wildcards
SELECT * FROM products WHERE name LIKE '%widget%'; -- Table scan
-- Implicit type conversions
SELECT * FROM products WHERE price = '100'; -- Convert price column if numeric
🎯 Tổng Kết
Key Takeaways:
-
Type Conversion Impact
- MySQL auto-converts types nhưng với performance cost
- VARCHAR numeric comparisons bypass indexes
- Solution: Use proper data types từ đầu
-
Index Limitations với LIKE
- Leading wildcards (
%abc%
) không thể dùng B-Tree indexes - Trailing wildcards (
abc%
) có thể dùng index hiệu quả - Solution: FULLTEXT search cho complex text matching
- Leading wildcards (
-
Performance Monitoring
- Slow query log và Performance Schema là essential tools
- Index usage statistics giúp identify optimization opportunities
- Regular monitoring prevents performance degradation
-
Best Practices
- Design: Proper data types và index strategies
- Development: Avoid anti-patterns (SELECT *, functions in WHERE)
- Operations: Monitor, analyze, optimize continuously
Performance Impact Summary:
Issue | Impact | Solution |
---|---|---|
VARCHAR numeric comparison | Index bypass, type conversion overhead | Fix data types, use proper comparisons |
Leading wildcard LIKE | Full table scans, 50-100x slower | FULLTEXT search, reverse indexes |
Missing indexes | Table scans, poor JOIN performance | Create targeted indexes |
Stale statistics | Wrong execution plans | Regular ANALYZE TABLE |
Final Recommendation:
"Performance optimization là ongoing process. Monitor continuously, analyze systematically, và optimize proactively để maintain optimal database performance."
All rights reserved