Vũ Trần Lê Anh @tranvudpqn123

Đã đăng vào Oct 9th, 2025 1:02 a.m. 1 phút đọc

276

Database Performance và Query Optimization - Hệ thống không sử dụng index khi filter

📋 Mục Lục

Type Conversion và Implicit Casting
Index Performance với LIKE Operations
Best Practices và Performance Tips
Monitoring và Troubleshooting

1. Type Conversion và Implicit Casting

Vấn Đề: VARCHAR Column với Numeric Comparison

Scenario:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    salary VARCHAR(50)  -- ❌ Salary stored as VARCHAR!
);

-- Query với numeric comparison:
SELECT * FROM employees WHERE salary = 4000;
-- ✅ Sẽ work, nhưng có performance issues!

MySQL Conversion Rules:

String-to-Number Conversion:

-- MySQL converts VARCHAR to NUMBER cho comparison
SELECT 
    '4000' = 4000,      -- Returns: 1 (TRUE)
    '4000.00' = 4000,   -- Returns: 1 (TRUE)  
    '04000' = 4000,     -- Returns: 1 (TRUE)
    '4000.50' = 4000,   -- Returns: 0 (FALSE)
    'N/A' = 4000,       -- Returns: 0 (FALSE)

Conversion Process:

-- MySQL converts string từ left to right until non-numeric character
SELECT 
    CAST('4000' AS UNSIGNED) as result1,        -- 4000
    CAST('4000.00' AS UNSIGNED) as result2,     -- 4000 
    CAST('04000' AS UNSIGNED) as result3,       -- 4000
    CAST('4000.50' AS UNSIGNED) as result4,     -- 4000 (truncates decimal)
    CAST('N/A' AS UNSIGNED) as result5,         -- 0 (non-numeric)

Performance Issues với Type Conversion:

❌ Problems:

-- BAD: Không thể sử dụng index hiệu quả!
SELECT * FROM employees WHERE salary = 4000;
-- MySQL phải convert TOÀN BỘ column salary → Very slow!

-- Index scan không hiệu quả:
CREATE INDEX idx_salary ON employees(salary);
EXPLAIN SELECT * FROM employees WHERE salary = 4000;
-- Result: type = ALL (Full table scan dù có index!)

✅ Solutions:

A) Fix Data Type:

-- Tốt nhất: Sửa data type
ALTER TABLE employees ADD COLUMN salary_numeric DECIMAL(10,2);
UPDATE employees SET salary_numeric = CAST(salary AS DECIMAL(10,2)) 
WHERE salary REGEXP '^[0-9]+(\.[0-9]+)?$';

-- Drop old column, rename new column
ALTER TABLE employees DROP COLUMN salary;
ALTER TABLE employees CHANGE salary_numeric salary DECIMAL(10,2);

B) Proper Comparison:

-- So sánh với string để use index
SELECT * FROM employees WHERE salary = '4000';

-- Hoặc convert explicit với validation
SELECT * FROM employees 
WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00
  AND salary REGEXP '^[0-9]+(\.[0-9]+)?$';  -- Ensure numeric

C) Functional Index (MySQL 8.0+):

-- Tạo functional index cho performance
CREATE INDEX idx_salary_numeric ON employees ((CAST(salary AS DECIMAL(10,2))));

-- Query sẽ nhanh hơn
SELECT * FROM employees WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00;

Real-World Example:

-- Legacy system problem
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    price VARCHAR(50)  -- 😱 Price stored as VARCHAR!
);

-- Data messy như này:
INSERT INTO products VALUES
(1, 'Laptop', '$1,299.99'),
(2, 'Mouse', '25.50'),  
(3, 'Keyboard', '89'),
(4, 'Monitor', 'Call for price');

-- Query này sẽ có unexpected results:
SELECT * FROM products WHERE price > 100;
-- Conversions: '$1,299.99' → 0, '25.50' → 25.50, '89' → 89, 'Call for price' → 0

2. Index Performance với LIKE Operations

Vấn Đề: Leading Wildcard Performance

Scenario:

CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100)
);

-- Tạo index trên username
CREATE INDEX idx_username ON users(username);

-- Query với leading wildcard:
SELECT * FROM users WHERE username LIKE '%abc%';
-- ❌ KHÔNG THỂ sử dụng index hiệu quả!

LIKE Patterns và Index Usage:

✅ Index-Friendly Patterns:

-- Exact match - FASTEST
SELECT * FROM users WHERE username = 'john_doe';
-- Plan: INDEX SEEK/REF

-- Trailing wildcard - FAST  
SELECT * FROM users WHERE username LIKE 'john%';
-- Plan: INDEX RANGE SCAN

❌ Index-Unfriendly Patterns:

-- Leading wildcard - SLOW
SELECT * FROM users WHERE username LIKE '%john';
-- Plan: FULL TABLE SCAN

-- Both sides wildcard - SLOWEST
SELECT * FROM users WHERE username LIKE '%john%';  
-- Plan: FULL TABLE SCAN

Execution Plan Comparison:

Trailing Wildcard (Good):

EXPLAIN SELECT * FROM users WHERE username LIKE 'abc%';

+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
|  1 | SIMPLE      | users | range | idx_username  | idx_username | 203     | NULL | 3    | Using where |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+

Leading Wildcard (Bad):

EXPLAIN SELECT * FROM users WHERE username LIKE '%abc%';

+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 10007 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Tại Sao Leading Wildcard Không Dùng Được Index?

B-Tree Index Structure:

B-Tree Index on username (sorted):
├── 'abc_user'
├── 'abcdef_test' 
├── 'jane_smith'
├── 'john_doe'
├── 'test_abc_end'
├── 'user_000001'
├── 'user_000002'
└── 'user_abc'

Index Access Patterns:

-- ✅ 'abc%' → Start from 'abc', scan until 'abd'
-- Efficient range scan trong sorted index

-- ❌ '%abc%' → Phải scan TOÀN BỘ index để check mỗi value!
-- Không có pattern để optimize

Performance Impact:

Test Results:

SET profiling = 1;

-- Leading wildcard (slow)
SELECT COUNT(*) FROM users WHERE username LIKE '%abc%';    -- 0.012543s

-- Trailing wildcard (fast)  
SELECT COUNT(*) FROM users WHERE username LIKE 'user%';    -- 0.000834s

-- Exact match (fastest)
SELECT COUNT(*) FROM users WHERE username = 'john_doe';    -- 0.000234s

SHOW PROFILES;

Performance Difference: 53x slower với leading wildcard!

Solutions cho Text Search:

1. Full-Text Search (Recommended):

-- Tạo FULLTEXT index
ALTER TABLE users ADD FULLTEXT(username);

-- Sử dụng MATCH...AGAINST
SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);

-- Performance với FULLTEXT:
EXPLAIN SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);
-- Uses fulltext index instead of table scan

2. Reverse Index Pattern:

-- Thêm column để store reverse string
ALTER TABLE users ADD COLUMN username_reverse VARCHAR(50);

-- Update với reverse values
UPDATE users SET username_reverse = REVERSE(username);

-- Tạo index trên reverse column
CREATE INDEX idx_username_reverse ON users(username_reverse);

-- Tìm strings kết thúc bằng 'abc': '%abc'
SELECT * FROM users 
WHERE username_reverse LIKE CONCAT(REVERSE('abc'), '%')
  AND username LIKE '%abc';  -- Double-check cho accuracy

3. N-Gram Index (MySQL 8.0+):

-- Tạo trigram index với ngram parser
CREATE FULLTEXT INDEX idx_username_ngram ON users(username) 
WITH PARSER ngram;

-- Query với minimum word length
SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);

Index Usage Summary:

Pattern	Index Usage	Performance	Use Case
`username = 'john'`	✅ INDEX SEEK	Fastest	Exact match
`username LIKE 'john%'`	✅ RANGE SCAN	Fast	Prefix search
`username LIKE '%john'`	❌ TABLE SCAN	Slow	Suffix search
`username LIKE '%john%'`	❌ TABLE SCAN	Slowest	Contains search
`MATCH...AGAINST('john')`	✅ FULLTEXT	Fast	Text search

3. Best Practices và Performance Tips

1. Data Type Best Practices:

✅ DO's:

-- Use appropriate data types
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),        -- ✅ Numeric type cho prices
    created_at TIMESTAMP,       -- ✅ Proper date type
    is_active BOOLEAN           -- ✅ Boolean cho flags
);

-- Use proper constraints
ALTER TABLE products 
ADD CONSTRAINT chk_price CHECK (price >= 0),
ADD CONSTRAINT chk_name CHECK (LENGTH(name) > 0);

❌ DON'Ts:

-- Avoid storing numbers as strings
CREATE TABLE bad_products (
    id VARCHAR(50),             -- ❌ ID should be INT
    price VARCHAR(20),          -- ❌ Price should be DECIMAL
    created_at VARCHAR(30),     -- ❌ Date should be TIMESTAMP
    is_active VARCHAR(10)       -- ❌ Boolean should be BOOLEAN
);

2. Index Design Best Practices:

✅ Effective Index Strategies:

-- Single-column indexes cho frequent WHERE clauses
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_date ON orders(order_date);

-- Composite indexes cho multiple-column queries
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

-- Covering indexes để avoid table lookups
CREATE INDEX idx_order_summary ON orders(customer_id, order_date, total_amount);

-- Partial indexes cho large tables với predictable patterns
CREATE INDEX idx_active_users ON users(username) WHERE is_active = true;

❌ Index Anti-patterns:

-- Too many single-column indexes (maintenance overhead)
CREATE INDEX idx1 ON table(col1);
CREATE INDEX idx2 ON table(col2);  
CREATE INDEX idx3 ON table(col3);  -- Consider composite instead

-- Indexes on low-cardinality columns
CREATE INDEX idx_gender ON users(gender);  -- Only 2-3 values, not selective

-- Redundant indexes
CREATE INDEX idx_a ON table(a);
CREATE INDEX idx_ab ON table(a,b);  -- idx_a is redundant if idx_ab covers queries

3. Query Optimization Patterns:

✅ Efficient Query Patterns:

-- Use specific columns instead of SELECT *
SELECT id, name, price FROM products WHERE category_id = 10;

-- Use LIMIT cho large result sets
SELECT * FROM products ORDER BY created_at DESC LIMIT 20;

-- Use EXISTS instead of IN cho large subqueries
SELECT * FROM customers c 
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Use proper JOIN syntax
SELECT c.name, COUNT(o.id) 
FROM customers c 
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

❌ Inefficient Query Patterns:

-- SELECT * cho large tables
SELECT * FROM large_table;  -- Transfers unnecessary data

-- Functions trong WHERE clause
SELECT * FROM orders WHERE YEAR(order_date) = 2024;  -- Can't use index
-- Better: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'

-- Leading wildcards
SELECT * FROM products WHERE name LIKE '%widget%';  -- Table scan

-- Implicit type conversions
SELECT * FROM products WHERE price = '100';  -- Convert price column if numeric

🎯 Tổng Kết

Key Takeaways:

Type Conversion Impact
- MySQL auto-converts types nhưng với performance cost
- VARCHAR numeric comparisons bypass indexes
- Solution: Use proper data types từ đầu
Index Limitations với LIKE
- Leading wildcards (%abc%) không thể dùng B-Tree indexes
- Trailing wildcards (abc%) có thể dùng index hiệu quả
- Solution: FULLTEXT search cho complex text matching
Performance Monitoring
- Slow query log và Performance Schema là essential tools
- Index usage statistics giúp identify optimization opportunities
- Regular monitoring prevents performance degradation
Best Practices
- Design: Proper data types và index strategies
- Development: Avoid anti-patterns (SELECT *, functions in WHERE)
- Operations: Monitor, analyze, optimize continuously

Performance Impact Summary:

Issue	Impact	Solution
VARCHAR numeric comparison	Index bypass, type conversion overhead	Fix data types, use proper comparisons
Leading wildcard LIKE	Full table scans, 50-100x slower	FULLTEXT search, reverse indexes
Missing indexes	Table scans, poor JOIN performance	Create targeted indexes
Stale statistics	Wrong execution plans	Regular ANALYZE TABLE

Final Recommendation:

"Performance optimization là ongoing process. Monitor continuously, analyze systematically, và optimize proactively để maintain optimal database performance."

nestjs backend

📋 Mục Lục

1. Type Conversion và Implicit Casting

Vấn Đề: VARCHAR Column với Numeric Comparison

Scenario:

MySQL Conversion Rules:

String-to-Number Conversion:

Conversion Process:

Performance Issues với Type Conversion:

❌ Problems:

✅ Solutions:

A) Fix Data Type:

B) Proper Comparison:

C) Functional Index (MySQL 8.0+):

Real-World Example:

2. Index Performance với LIKE Operations

Vấn Đề: Leading Wildcard Performance

Scenario:

LIKE Patterns và Index Usage:

✅ Index-Friendly Patterns:

❌ Index-Unfriendly Patterns:

Execution Plan Comparison:

Trailing Wildcard (Good):

Leading Wildcard (Bad):

Tại Sao Leading Wildcard Không Dùng Được Index?

B-Tree Index Structure:

Index Access Patterns:

Performance Impact:

Test Results:

Solutions cho Text Search:

1. Full-Text Search (Recommended):

2. Reverse Index Pattern:

3. N-Gram Index (MySQL 8.0+):

Index Usage Summary:

3. Best Practices và Performance Tips

1. Data Type Best Practices:

✅ DO's:

❌ DON'Ts:

2. Index Design Best Practices:

✅ Effective Index Strategies:

❌ Index Anti-patterns:

3. Query Optimization Patterns:

✅ Efficient Query Patterns:

❌ Inefficient Query Patterns:

🎯 Tổng Kết

Key Takeaways:

Performance Impact Summary:

Final Recommendation:

Mục lục