Đã đăng vào thg 9 25, 2024 10:34 CH 4 phút đọc

3.4K

Tất tần tật về Index trong SQL Server

Trong MS SQL Server, Index là một cấu trúc dữ liệu lưu dưới dạng table small dùng để cải thiện tốc độ các câu query có điều kiện(where, join, group by, order by) và tìm kiếm dữ liệu trong table hoặc view. Index hoạt động tương tự như mục lục của một cuốn sách, giúp nhanh chóng xác định vị trí row trong table mà không cần quét toàn bộ table.

Kiến thức cần biết

Có 2 loại index trong SQL Server:

Clustered Index

- Sắp xếp data vật lý theo 1 thứ tự nhất định

- 1 bảng chỉ có 1 index clustered, mặc định Sql Server sẽ tạo 1 index dạng clustered index trên primary key(unique, not null).

- 1 index clustered có thể đánh gom nhiều cột(composite clustered index)

Non-Clustered Index

- Không ảnh hưởng tới thứ tự sắp xếp của table

- 1 bảng có nhiều index none-clustered lưu trữ con trỏ trỏ tới vị trí của index

- Có thể đánh nhiều cột thành 1 index(composite non-clustered index)

Systax:

-- get list index of table
sp_helpindex 'table_name'

-- create index
create index index_name
on table_name(column1, column2, .., columnN);

-- create single index
create index products_category
on products(category);

-- create composite index
create index products_category_brand
on products(category, brand_id);

-- create unique index
create unique index products_category_brand
on products(category, brand_id);

-- create filter index
create index products_brand
on products(brand_id)
where brand_id = 1;

-- create unique and filter index
create unique index products_brand_1_category
on products (category, brand_id)
where brand_id = 1;

-- create include column index
create nonclustered index products_brand on products (brand_id) include (category)

-- create columnstore index: chuyên sử dụng trong phân tích và báo cáo
create nonclustered  columnstore index nclstore_quantity_price
on sales(auantity, price);

-- create xml index: chuyên sử dụng cho column lưu trữ data dạng xml
path: tìm kiếm theo path
create xml index pidx_productinfo_path
on products (productinfo)
using xml index pidx_productinfo
for path;

value: tìm kiếm theo value
create xml index pidx_productinfo_value
on products (productinfo)
using xml index pidx_productinfo
for value;

property: tìm kiếm theo property
create xml index pidx_productinfo_property
on products (productinfo)
using xml index pidx_productinfo
for property;

VD truy vấn sử dụng đường dẫn (path)
select productinfo.query('/product/name')
from products
where productinfo.exist('/product[category="electronics"]') = 1;

VD truy vấn tìm kiếm giá trị
select productinfo.value('(/product/price)[1]', 'decimal(10, 2)')
from products
where productinfo.exist('/product[name="laptop"]') = 1;

VD truy vấn truy xuất thuộc tính
select productinfo.value('(/product/@id)[1]', 'int')
from products;

-- update index
alter index index_name on table_name  rebuild/recognize/disable.

-- delete index
drop index index_name on table_name

Index trong tối ưu performence

Cân nhắc khi đánh index:

Index Clustered: Thường được sử dụng cho các column mà các truy vấn thường xuyên order hoặc group by theo thứ tự. Một bảng chỉ có thể có một index clustered, thường là private key
Index Non-Clustered: Thường được sử dụng cho các cột được tìm kiếm hoặc lọc thường xuyên, nhưng không nhất thiết phải sắp xếp theo thứ tự.
Index Composite: Các chỉ mục bao gồm nhiều cột có thể rất hữu ích cho các truy vấn phức tạp sử dụng nhiều cột trong các điều kiện tìm kiếm và sắp xếp. Thứ tự của các cột trong chỉ mục rất quan trọng và nên phản ánh thứ tự của các điều kiện trong truy vấn.

Ví dụ:

Table có INDEX (A, B), thì truy vấn WHERE A = ? AND B = ?, excution plan sẽ sử dụng index này

Table có INDEX (A, B) và truy vấn WHERE B = ?, excution plan có thể không sử dụng index này

Khi column có nhiều giá trị null cần giảm số lượng row cần quét khi truy vấn, tạo một Filtered Index chỉ bao gồm các hàng mà cột không có giá trị NULL.

create nonclustered index idx_filtered_non_nulls on table_name (column_name) where column_name is not null;

Nếu bạn cần bao gồm cả các giá trị NULL và non-NULL trong chỉ mục, bạn có thể sử dụng một cột tính toán để chuyển đổi các giá trị NULL thành một giá trị khác (ví dụ: 'UNKNOWN' hoặc một giá trị mặc định) và sau đó đánh index trên cột tính toán đó.

alter table table_name
add computed_column as isnull(column_name, 'default_value');

create nonclustered index idx_computed_column
on table_name (computed_column);

Lưu ý với các câu query Where Non-Sargable và Sargable

1. Các Điều Kiện Non-Sargable: không thể dùng index dẫn đến table scan
<> hoặc !=
!> hoặc !<
not exists
not in
not like
like '%<literal>'
or
tính toán trên cột (function(column))

2. Các Điều Kiện Sargable: có thể tận dụng index tìm kiếm
=
>
<
>=
<=
between
in
like 'abc%'

Sử dụng Include Covering Index

Để loại bỏ KeyLookup trong excution plan khi cần lấy thêm thông tin(ví dụ: select id, name from products where id = 1) thì ta có thể include name vào non-clustered index

Trường hợp 1: KeyLookup biến mất tuy nhiên thêm cột vào Index tăng chi phí duy trì và phức tạp hơn
create nonclustered  index idx_id_name on products(id, name)

Trường hợp 2: sử dụng covering index, tuy nhiên cần cân nhắc chi phí khi include column theo đôi khi còn quá chi phí so với keylookup nên cần phải xem thay đổi thực tế trên excution plan
create nonclustered  index idx_id on products(id) include (name)

Gather statistic cho index

Sử dụng AUTO_UPDATE_STATISTICS: Tính năng này được bật mặc định và nên được giữ nguyên để đảm bảo statistic luôn cập nhật
alter database [databasename] set auto_update_statistics on;

Sử dụng AUTO_UPDATE_STATISTICS_ASYNC: cập nhật statistic bất đồng bộ, giúp tránh ảnh hưởng đến hiệu suất của các truy vấn, Sql Server chạy background update
alter database [databasename] set auto_update_statistics_async on;

Sử dụng lệnh UPDATE STATISTICS: cập nhập statistic cho table
update statistics table_name;

Cập nhật statistic cho index
update statistics table_name index_name;

Sử dụng lệnh sp_updatestats:  update tất cả statistic trong database
exec sp_updatestats;

Đánh index tốt nhất cho các column có loại dữ liệu dễ dàng sắp xếp, có kích thước nhỏ như số, ngày giờ, với những column dạng nvarchar, varchar, text cần dùng Full Text Search Index giúp tăng tốc độ truy vấn tìm kiếm văn bản dài và phức tạp (tham khảo bài viết về đánh Fulltext Search: https://www.linkedin.com/pulse/full-text-search-sql-server-practical-guide-baligh-mehrez/)

-- Tạo Catalog Fulltext
create fulltext catalog myfulltextcatalog;

-- Tạo Index fulltext
create fulltext index on table_name(column_name)
key index idx_primary_key
on myfulltextcatalog;

Xử lý Redundant, Overlapping & Duplicate Index(Trùng lặp Index, dẫn tới ghi đè và thừa index) (code này là nguồn copy nha)

WITH indexcolumns
     AS (SELECT I.object_id                   AS TableObjectId,
                Object_schema_name(I.object_id) + '.'
                + Object_name(I.object_id)    AS TableName,
                I.index_id                    AS IndexId,
                I.NAME                        AS IndexName,
                ( IndexUsage.user_seeks
                  + IndexUsage.user_scans
                  + IndexUsage.user_lookups ) AS IndexUsage,
                IndexUsage.user_updates       AS IndexUpdates,
                (SELECT CASE is_included_column
                          WHEN 1 THEN NULL
                          ELSE column_id
                        END AS [data()]
                 FROM   sys.index_columns AS IndexColumns
                 WHERE  IndexColumns.object_id = I.object_id
                        AND IndexColumns.index_id = I.index_id
                 ORDER  BY index_column_id,
                           column_id
                 FOR xml path(''))            AS ConcIndexColumnNrs,
                (SELECT CASE is_included_column
                          WHEN 1 THEN NULL
                          ELSE Col_name(I.object_id, column_id)
                        END AS [data()]
                 FROM   sys.index_columns AS IndexColumns
                 WHERE  IndexColumns.object_id = I.object_id
                        AND IndexColumns.index_id = I.index_id
                 ORDER  BY index_column_id,
                           column_id
                 FOR xml path(''))            AS ConcIndexColumnNames,
                (SELECT CASE is_included_column
                          WHEN 1 THEN column_id
                          ELSE NULL
                        END AS [data()]
                 FROM   sys.index_columns AS IndexColumns
                 WHERE  IndexColumns.object_id = I.object_id
                        AND IndexColumns.index_id = I.index_id
                 ORDER  BY column_id
                 FOR xml path(''))            AS ConcIncludeColumnNrs,
                (SELECT CASE is_included_column
                          WHEN 1 THEN Col_name(I.object_id, column_id)
                          ELSE NULL
                        END AS [data()]
                 FROM   sys.index_columns AS IndexColumns
                 WHERE  IndexColumns.object_id = I.object_id
                        AND IndexColumns.index_id = I.index_id
                 ORDER  BY column_id
                 FOR xml path(''))            AS ConcIncludeColumnNames
         FROM   sys.indexes AS I
                LEFT OUTER JOIN sys.dm_db_index_usage_stats AS IndexUsage
                             ON IndexUsage.object_id = I.object_id
                                AND IndexUsage.index_id = I.index_id
                                AND IndexUsage.database_id = Db_id())
SELECT C1.tablename AS 'TableName',
       C1.indexname   AS 'Index1',
       C2.indexname   AS 'Index2',
       CASE
         WHEN ( C1.concindexcolumnnrs = C2.concindexcolumnnrs )
              AND ( C1.concincludecolumnnrs = C2.concincludecolumnnrs ) THEN
         'Exact duplicate'
         WHEN ( C1.concindexcolumnnrs = C2.concindexcolumnnrs ) THEN
         'Different includes'
         ELSE 'Overlapping columns'
       END
       --  , C1.ConcIndexColumnNrs
       --  , C2.ConcIndexColumnNrs
       ,
       C1.concindexcolumnnames AS ConcIncludeColumnNrs,
       C2.concindexcolumnnames AS ConcIncludeColumnNrs
       --  , C1.ConcIncludeColumnNrs
       --  , C2.ConcIncludeColumnNrs
       ,
       C1.concincludecolumnnames,
       C2.concincludecolumnnames,
       C1.indexusage,
       C2.indexusage,
       C1.indexupdates,
       C2.indexupdates,
       'DROP INDEX ' + C2.indexname + ' ON '
       + C2.tablename AS Drop2,
       'DROP INDEX ' + C1.indexname + ' ON '
       + C1.tablename AS Drop1
FROM   indexcolumns AS C1
       INNER JOIN indexcolumns AS C2
               ON ( C1.tableobjectid = C2.tableobjectid )
                  AND (
                      -- exact: show lower IndexId as 1
                      ( C1.indexid < C2.indexid
                        AND C1.concindexcolumnnrs = C2.concindexcolumnnrs
                        AND C1.concincludecolumnnrs = C2.concincludecolumnnrs )
                       -- different includes: show longer include as 1
                       OR ( C1.concindexcolumnnrs = C2.concindexcolumnnrs
                            AND Len(C1.concincludecolumnnrs) >
                                Len(C2.concincludecolumnnrs) )
                       -- overlapping: show longer index as 1
                       OR ( C1.indexid <> C2.indexid
                            AND C1.concindexcolumnnrs <> C2.concindexcolumnnrs
                            AND C1.concindexcolumnnrs LIKE
                                C2.concindexcolumnnrs + ' %'
                          ) )
ORDER  BY C1.tablename,
          C1.concindexcolumnnrs

Xử lý các Heap Table: là các table không có Clustered Index, table không được sắp xếp ưu điểm là INSERT nhanh nhưng nhược điểm là query chậm, trường hợp SQL Server phải scan trong heap table sẽ tăng cost cho quá trình scan (so với bảng có Clustered Index), ảnh hưởng tiêu cực đến hiệu năng hệ thống. Code kiểm tra table chưa có Clustered Index

SELECT 
    t.name AS TableName,
    i.type_desc AS IndexType
FROM 
    sys.tables t
    LEFT JOIN sys.indexes i ON t.object_id = i.object_id AND i.index_id = 1
WHERE 
    i.object_id IS NULL;

Thống kê tình trạng sử dụng của index

SELECT 
    DB_NAME(s.database_id) AS DatabaseName,
    OBJECT_NAME(s.object_id) AS TableName,
    i.name AS IndexName,
    i.type_desc AS IndexType,
    i.is_primary_key AS IsPrimaryKey,
    i.is_unique AS IsUnique,
    s.user_seeks AS UserSeeks,
    s.user_scans AS UserScans,
    s.user_lookups AS UserLookups,
    s.user_updates AS UserUpdates,
    s.last_user_seek AS LastUserSeek,
    s.last_user_scan AS LastUserScan,
    s.last_user_lookup AS LastUserLookup,
    s.last_user_update AS LastUserUpdate
FROM 
    sys.dm_db_index_usage_stats AS s
JOIN 
    sys.indexes AS i ON s.object_id = i.object_id AND s.index_id = i.index_id
WHERE 
    s.database_id = DB_ID()
ORDER BY 
        (s.user_seeks + s.user_scans + s.user_lookups) DESC;

user_seeks: tần suất chỉ mục được sử dụng để tìm kiếm dữ liệu.
user_scans: tần suất chỉ mục được sử dụng để quét dữ liệu.
user_lookups: tần suất chỉ mục được sử dụng để tra cứu dữ liệu từ bảng.
user_updates: tần suất chỉ mục phải được cập nhật do các thay đổi trong dữ liệu.

Xác địng index thiếu bằng excution plan hoặc dùng tool của Sql Server(SQL Server's Database Engine Tuning Advisor) hoặc code, sau đó phân tích điểm lợi điểm hại trước khi tạo index theo recommend

SELECT
    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
    OBJECT_NAME(mid.[object_id]) AS TableName,
    'CREATE INDEX [IX_' + OBJECT_NAME(mid.[object_id]) + '_' + REPLACE(REPLACE(REPLACE(ISNULL(mid.equality_columns, ''), ', ', '_'), '[', ''), ']', '') 
    + CASE WHEN mid.[equality_columns] IS NOT NULL AND mid.[inequality_columns] IS NOT NULL THEN '_' ELSE '' END
    + REPLACE(REPLACE(REPLACE(ISNULL(mid.inequality_columns, ''), ', ', '_'), '[', ''), ']', '') + ']'
    + ' ON ' + mid.statement
    + ' (' + ISNULL (mid.equality_columns,'')
    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
    + ISNULL (mid.inequality_columns, '')
    + ')' 
    + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
    migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE mid.database_id = DB_ID() -- Replace with your database ID
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC;

Xử lý fragment index cũng là vấn đề gây chậm query

-- kiểm tra mức độ phân mảnh của index
SELECT 
    DB_NAME(ps.database_id) AS DatabaseName,
    OBJECT_NAME(ps.object_id) AS TableName,
    i.name AS IndexName,
    ps.index_id,
    ps.avg_fragmentation_in_percent, --phần trăm phân mảnh trung bình của index.
    ps.page_count
FROM 
    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ps
JOIN 
    sys.indexes AS i ON ps.object_id = i.object_id AND ps.index_id = i.index_id
WHERE 
    ps.database_id = DB_ID()
    AND ps.page_count > 1000  -- Chỉ xem các chỉ mục với hơn 1000 trang để tránh kiểm tra các chỉ mục nhỏ
ORDER BY 
    ps.avg_fragmentation_in_percent DESC;

mức độ phân mảnh:
- dưới 5%: chấp nhận được, không cần hành động.
- 5% - 30%: recognize ngay
   alter index index_name on  table_name recognize;
- trên 30%: rebuild ngay
   alter index index_name on  table_name rebuild;

Kiểm tra index không sử dụng

SELECT 
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    i.index_id,
    i.type_desc AS IndexType,
    i.is_primary_key,
    i.is_unique,
    i.data_space_id,
    i.ignore_dup_key,
    i.is_disabled,
    i.is_hypothetical,
    i.allow_row_locks,
    i.allow_page_locks
FROM 
    sys.indexes AS i
LEFT JOIN 
    sys.dm_db_index_usage_stats AS s 
    ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE 
    i.type_desc <> 'HEAP'
    AND s.index_id IS NULL 
    AND i.object_id > 100;  -- Loại bỏ các đối tượng hệ thống

SQL standard reports: sử dụng report để check status index đang sử dụng

Kiến thức cần biết

Clustered Index

Non-Clustered Index

Index trong tối ưu performence

Mục lục