0

Database Performance và Query Optimization - Hệ thống không sử dụng index khi filter

📋 Mục Lục

  1. Type Conversion và Implicit Casting
  2. Index Performance với LIKE Operations
  3. Best Practices và Performance Tips
  4. Monitoring và Troubleshooting

1. Type Conversion và Implicit Casting

Vấn Đề: VARCHAR Column với Numeric Comparison

Scenario:

CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    salary VARCHAR(50)  -- ❌ Salary stored as VARCHAR!
);

-- Query với numeric comparison:
SELECT * FROM employees WHERE salary = 4000;
-- ✅ Sẽ work, nhưng có performance issues!

MySQL Conversion Rules:

String-to-Number Conversion:
-- MySQL converts VARCHAR to NUMBER cho comparison
SELECT 
    '4000' = 4000,      -- Returns: 1 (TRUE)
    '4000.00' = 4000,   -- Returns: 1 (TRUE)  
    '04000' = 4000,     -- Returns: 1 (TRUE)
    '4000.50' = 4000,   -- Returns: 0 (FALSE)
    'N/A' = 4000,       -- Returns: 0 (FALSE)
Conversion Process:
-- MySQL converts string từ left to right until non-numeric character
SELECT 
    CAST('4000' AS UNSIGNED) as result1,        -- 4000
    CAST('4000.00' AS UNSIGNED) as result2,     -- 4000 
    CAST('04000' AS UNSIGNED) as result3,       -- 4000
    CAST('4000.50' AS UNSIGNED) as result4,     -- 4000 (truncates decimal)
    CAST('N/A' AS UNSIGNED) as result5,         -- 0 (non-numeric)

Performance Issues với Type Conversion:

❌ Problems:

-- BAD: Không thể sử dụng index hiệu quả!
SELECT * FROM employees WHERE salary = 4000;
-- MySQL phải convert TOÀN BỘ column salary → Very slow!

-- Index scan không hiệu quả:
CREATE INDEX idx_salary ON employees(salary);
EXPLAIN SELECT * FROM employees WHERE salary = 4000;
-- Result: type = ALL (Full table scan dù có index!)

✅ Solutions:

A) Fix Data Type:
-- Tốt nhất: Sửa data type
ALTER TABLE employees ADD COLUMN salary_numeric DECIMAL(10,2);
UPDATE employees SET salary_numeric = CAST(salary AS DECIMAL(10,2)) 
WHERE salary REGEXP '^[0-9]+(\.[0-9]+)?$';

-- Drop old column, rename new column
ALTER TABLE employees DROP COLUMN salary;
ALTER TABLE employees CHANGE salary_numeric salary DECIMAL(10,2);
B) Proper Comparison:
-- So sánh với string để use index
SELECT * FROM employees WHERE salary = '4000';

-- Hoặc convert explicit với validation
SELECT * FROM employees 
WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00
  AND salary REGEXP '^[0-9]+(\.[0-9]+)?$';  -- Ensure numeric
C) Functional Index (MySQL 8.0+):
-- Tạo functional index cho performance
CREATE INDEX idx_salary_numeric ON employees ((CAST(salary AS DECIMAL(10,2))));

-- Query sẽ nhanh hơn
SELECT * FROM employees WHERE CAST(salary AS DECIMAL(10,2)) = 4000.00;

Real-World Example:

-- Legacy system problem
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    price VARCHAR(50)  -- 😱 Price stored as VARCHAR!
);

-- Data messy như này:
INSERT INTO products VALUES
(1, 'Laptop', '$1,299.99'),
(2, 'Mouse', '25.50'),  
(3, 'Keyboard', '89'),
(4, 'Monitor', 'Call for price');

-- Query này sẽ có unexpected results:
SELECT * FROM products WHERE price > 100;
-- Conversions: '$1,299.99' → 0, '25.50' → 25.50, '89' → 89, 'Call for price' → 0

2. Index Performance với LIKE Operations

Vấn Đề: Leading Wildcard Performance

Scenario:

CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(50),
    email VARCHAR(100)
);

-- Tạo index trên username
CREATE INDEX idx_username ON users(username);

-- Query với leading wildcard:
SELECT * FROM users WHERE username LIKE '%abc%';
-- ❌ KHÔNG THỂ sử dụng index hiệu quả!

LIKE Patterns và Index Usage:

✅ Index-Friendly Patterns:

-- Exact match - FASTEST
SELECT * FROM users WHERE username = 'john_doe';
-- Plan: INDEX SEEK/REF

-- Trailing wildcard - FAST  
SELECT * FROM users WHERE username LIKE 'john%';
-- Plan: INDEX RANGE SCAN

❌ Index-Unfriendly Patterns:

-- Leading wildcard - SLOW
SELECT * FROM users WHERE username LIKE '%john';
-- Plan: FULL TABLE SCAN

-- Both sides wildcard - SLOWEST
SELECT * FROM users WHERE username LIKE '%john%';  
-- Plan: FULL TABLE SCAN

Execution Plan Comparison:

Trailing Wildcard (Good):

EXPLAIN SELECT * FROM users WHERE username LIKE 'abc%';
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
|  1 | SIMPLE      | users | range | idx_username  | idx_username | 203     | NULL | 3    | Using where |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+

Leading Wildcard (Bad):

EXPLAIN SELECT * FROM users WHERE username LIKE '%abc%';
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | users | ALL  | NULL          | NULL | NULL    | NULL | 10007 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+

Tại Sao Leading Wildcard Không Dùng Được Index?

B-Tree Index Structure:

B-Tree Index on username (sorted):
├── 'abc_user'
├── 'abcdef_test' 
├── 'jane_smith'
├── 'john_doe'
├── 'test_abc_end'
├── 'user_000001'
├── 'user_000002'
└── 'user_abc'

Index Access Patterns:

-- ✅ 'abc%' → Start from 'abc', scan until 'abd'
-- Efficient range scan trong sorted index

-- ❌ '%abc%' → Phải scan TOÀN BỘ index để check mỗi value!
-- Không có pattern để optimize

Performance Impact:

Test Results:

SET profiling = 1;

-- Leading wildcard (slow)
SELECT COUNT(*) FROM users WHERE username LIKE '%abc%';    -- 0.012543s

-- Trailing wildcard (fast)  
SELECT COUNT(*) FROM users WHERE username LIKE 'user%';    -- 0.000834s

-- Exact match (fastest)
SELECT COUNT(*) FROM users WHERE username = 'john_doe';    -- 0.000234s

SHOW PROFILES;

Performance Difference: 53x slower với leading wildcard!

Solutions cho Text Search:

1. Full-Text Search (Recommended):

-- Tạo FULLTEXT index
ALTER TABLE users ADD FULLTEXT(username);

-- Sử dụng MATCH...AGAINST
SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);

-- Performance với FULLTEXT:
EXPLAIN SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);
-- Uses fulltext index instead of table scan

2. Reverse Index Pattern:

-- Thêm column để store reverse string
ALTER TABLE users ADD COLUMN username_reverse VARCHAR(50);

-- Update với reverse values
UPDATE users SET username_reverse = REVERSE(username);

-- Tạo index trên reverse column
CREATE INDEX idx_username_reverse ON users(username_reverse);

-- Tìm strings kết thúc bằng 'abc': '%abc'
SELECT * FROM users 
WHERE username_reverse LIKE CONCAT(REVERSE('abc'), '%')
  AND username LIKE '%abc';  -- Double-check cho accuracy

3. N-Gram Index (MySQL 8.0+):

-- Tạo trigram index với ngram parser
CREATE FULLTEXT INDEX idx_username_ngram ON users(username) 
WITH PARSER ngram;

-- Query với minimum word length
SELECT * FROM users 
WHERE MATCH(username) AGAINST('abc' IN BOOLEAN MODE);

Index Usage Summary:

Pattern Index Usage Performance Use Case
username = 'john' ✅ INDEX SEEK Fastest Exact match
username LIKE 'john%' ✅ RANGE SCAN Fast Prefix search
username LIKE '%john' ❌ TABLE SCAN Slow Suffix search
username LIKE '%john%' ❌ TABLE SCAN Slowest Contains search
MATCH...AGAINST('john') ✅ FULLTEXT Fast Text search

3. Best Practices và Performance Tips

1. Data Type Best Practices:

✅ DO's:

-- Use appropriate data types
CREATE TABLE products (
    id INT PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),        -- ✅ Numeric type cho prices
    created_at TIMESTAMP,       -- ✅ Proper date type
    is_active BOOLEAN           -- ✅ Boolean cho flags
);

-- Use proper constraints
ALTER TABLE products 
ADD CONSTRAINT chk_price CHECK (price >= 0),
ADD CONSTRAINT chk_name CHECK (LENGTH(name) > 0);

❌ DON'Ts:

-- Avoid storing numbers as strings
CREATE TABLE bad_products (
    id VARCHAR(50),             -- ❌ ID should be INT
    price VARCHAR(20),          -- ❌ Price should be DECIMAL
    created_at VARCHAR(30),     -- ❌ Date should be TIMESTAMP
    is_active VARCHAR(10)       -- ❌ Boolean should be BOOLEAN
);

2. Index Design Best Practices:

✅ Effective Index Strategies:

-- Single-column indexes cho frequent WHERE clauses
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_date ON orders(order_date);

-- Composite indexes cho multiple-column queries
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

-- Covering indexes để avoid table lookups
CREATE INDEX idx_order_summary ON orders(customer_id, order_date, total_amount);

-- Partial indexes cho large tables với predictable patterns
CREATE INDEX idx_active_users ON users(username) WHERE is_active = true;

❌ Index Anti-patterns:

-- Too many single-column indexes (maintenance overhead)
CREATE INDEX idx1 ON table(col1);
CREATE INDEX idx2 ON table(col2);  
CREATE INDEX idx3 ON table(col3);  -- Consider composite instead

-- Indexes on low-cardinality columns
CREATE INDEX idx_gender ON users(gender);  -- Only 2-3 values, not selective

-- Redundant indexes
CREATE INDEX idx_a ON table(a);
CREATE INDEX idx_ab ON table(a,b);  -- idx_a is redundant if idx_ab covers queries

3. Query Optimization Patterns:

✅ Efficient Query Patterns:

-- Use specific columns instead of SELECT *
SELECT id, name, price FROM products WHERE category_id = 10;

-- Use LIMIT cho large result sets
SELECT * FROM products ORDER BY created_at DESC LIMIT 20;

-- Use EXISTS instead of IN cho large subqueries
SELECT * FROM customers c 
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Use proper JOIN syntax
SELECT c.name, COUNT(o.id) 
FROM customers c 
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;

❌ Inefficient Query Patterns:

-- SELECT * cho large tables
SELECT * FROM large_table;  -- Transfers unnecessary data

-- Functions trong WHERE clause
SELECT * FROM orders WHERE YEAR(order_date) = 2024;  -- Can't use index
-- Better: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'

-- Leading wildcards
SELECT * FROM products WHERE name LIKE '%widget%';  -- Table scan

-- Implicit type conversions
SELECT * FROM products WHERE price = '100';  -- Convert price column if numeric

🎯 Tổng Kết

Key Takeaways:

  1. Type Conversion Impact

    • MySQL auto-converts types nhưng với performance cost
    • VARCHAR numeric comparisons bypass indexes
    • Solution: Use proper data types từ đầu
  2. Index Limitations với LIKE

    • Leading wildcards (%abc%) không thể dùng B-Tree indexes
    • Trailing wildcards (abc%) có thể dùng index hiệu quả
    • Solution: FULLTEXT search cho complex text matching
  3. Performance Monitoring

    • Slow query log và Performance Schema là essential tools
    • Index usage statistics giúp identify optimization opportunities
    • Regular monitoring prevents performance degradation
  4. Best Practices

    • Design: Proper data types và index strategies
    • Development: Avoid anti-patterns (SELECT *, functions in WHERE)
    • Operations: Monitor, analyze, optimize continuously

Performance Impact Summary:

Issue Impact Solution
VARCHAR numeric comparison Index bypass, type conversion overhead Fix data types, use proper comparisons
Leading wildcard LIKE Full table scans, 50-100x slower FULLTEXT search, reverse indexes
Missing indexes Table scans, poor JOIN performance Create targeted indexes
Stale statistics Wrong execution plans Regular ANALYZE TABLE

Final Recommendation:

"Performance optimization là ongoing process. Monitor continuously, analyze systematically, và optimize proactively để maintain optimal database performance."


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí