When you’re diving into relational databases like SQL, indexing might not be the first thing on your mind, especially if you’re just starting out as a developer. It’s easy to get caught up in making things work and overlook how much faster and smoother your queries can or should be with the right indexes. But trust me, getting a handle on indexing is a game-changer for creating efficient and scalable apps. Let’s break down how indexing works and see a real example to show just how much of a difference it can make.
Table of contents
Open Table of contents
What is an Index?
An index is a data structure that improves the speed of data retrieval operations on a database table. Essentially, it serves as a lookup table for quickly finding rows based on column values. Without an index, a database system would need to perform a full table scan, examining each row to find the desired data, which can be time-consuming for large datasets.
Example Scenario: Query Performance Without and With Indexing
Let’s consider a simple example to illustrate the impact of indexing on query performance.
Step 1: Creating a Table Without an Index
Suppose we create a users
table with 50 rows of data but no indexes.
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert 50 records
INSERT INTO users (name, email) VALUES
('John Doe', 'john.doe@example.com'),
('Jane Smith', 'jane.smith@example.com'),
('Michael Johnson', 'michael.johnson@example.com'),
('Emily Davis', 'emily.davis@example.com'),
('David Brown', 'david.brown@example.com'),
('Sophia Wilson', 'sophia.wilson@example.com'),
('Daniel Martinez', 'daniel.martinez@example.com'),
('Olivia Anderson', 'olivia.anderson@example.com'),
('Matthew Thomas', 'matthew.thomas@example.com'),
('Ava Jackson', 'ava.jackson@example.com'),
('Liam White', 'liam.white@example.com'),
('Isabella Harris', 'isabella.harris@example.com'),
('Noah Martin', 'noah.martin@example.com'),
('Mia Thompson', 'mia.thompson@example.com'),
('Lucas Garcia', 'lucas.garcia@example.com'),
('Amelia Lee', 'amelia.lee@example.com'),
('Mason Walker', 'mason.walker@example.com'),
('Charlotte Hall', 'charlotte.hall@example.com'),
('James Allen', 'james.allen@example.com'),
('Abigail Young', 'abigail.young@example.com'),
('Benjamin King', 'benjamin.king@example.com'),
('Evelyn Wright', 'evelyn.wright@example.com'),
('Elijah Scott', 'elijah.scott@example.com'),
('Harper Green', 'harper.green@example.com'),
('Alexander Adams', 'alexander.adams@example.com'),
('Ella Baker', 'ella.baker@example.com'),
('William Nelson', 'william.nelson@example.com'),
('Grace Carter', 'grace.carter@example.com'),
('Henry Mitchell', 'henry.mitchell@example.com'),
('Chloe Perez', 'chloe.perez@example.com'),
('Jacob Roberts', 'jacob.roberts@example.com'),
('Victoria Turner', 'victoria.turner@example.com'),
('Samuel Phillips', 'samuel.phillips@example.com'),
('Lily Campbell', 'lily.campbell@example.com'),
('Ethan Parker', 'ethan.parker@example.com'),
('Hannah Evans', 'hannah.evans@example.com'),
('Jack Edwards', 'jack.edwards@example.com'),
('Aria Collins', 'aria.collins@example.com'),
('Sebastian Stewart', 'sebastian.stewart@example.com'),
('Scarlett Sanchez', 'scarlett.sanchez@example.com'),
('Logan Morris', 'logan.morris@example.com'),
('Penelope Rogers', 'penelope.rogers@example.com'),
('Owen Reed', 'owen.reed@example.com'),
('Zoey Cook', 'zoey.cook@example.com'),
('Aiden Bell', 'aiden.bell@example.com'),
('Riley Murphy', 'riley.murphy@example.com'),
('Lucas Cooper', 'lucas.cooper@example.com'),
('Nora Bailey', 'nora.bailey@example.com'),
('Carter Rivera', 'carter.rivera@example.com');
-- Insert more rows if you want
Step 2: Running a Query Without an Index
Now, let’s run a query to find a user by their email address:
SELECT * FROM users WHERE email = 'john.doe@example.com';
Since there is no index on the email
column, the database performs a full table scan, checking every row to find the match. You can use the EXPLAIN
command to see how the query is executed:
EXPLAIN SELECT * FROM users WHERE email = 'john.doe@example.com' \G;
Result:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 49
filtered: 10.00
Extra: Using where
When you run the EXPLAIN
query before adding an index, you see the following key details:
-
type: ALL : This means that MySQL is performing a full table scan. Instead of using an index, it scans every row in the table to find the ones that match the
WHERE
condition. This is the least efficient type of query execution because it processes all rows, even when only a few match the query condition. -
rows: 49 : The
rows
value indicates the number of rows MySQL estimates it will examine to execute the query. Since the table contains 50 rows and no index is being used, it estimates that it will scan almost all rows (49 out of 50). -
filtered: 10.00 : This indicates that MySQL expects 10% of the rows it scans will satisfy the
WHERE
condition. In this case, 10% of 49 rows means around 4-5 rows are expected to match the query. -
possible_keys: NULL and key: NULL : This indicates that there are no indexes available that could be used for this query. Since no key is being used, MySQL has to rely on the full table scan to find the matching rows.
Step 3: Adding an Index
To improve this query’s performance, we can add an index on the email
column:
CREATE INDEX idx_email ON users(email);
Step 4: Running the Query with the Index
Now, let’s run the same query again and check the execution plan:
EXPLAIN SELECT * FROM users WHERE email = 'john.doe@example.com' \G;
With the index in place, the database uses the index to quickly locate the matching rows instead of scanning the entire table. The EXPLAIN
output would be:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: users
partitions: NULL
type: ref
possible_keys: idx_email
key: idx_email
key_len: 403
ref: const
rows: 1
filtered: 100.00
Extra: NULL
Notice the EXPLAIN output is significantly different:
-
type: ref : This indicates that MySQL is now using a non-unique index to find the rows that match the query condition. The
ref
type is much more efficient thanALL
because instead of scanning the entire table, MySQL uses the index to directly find the relevant rows. -
rows: 1 : The
rows
value now indicates that MySQL only needs to examine 1 row to satisfy the query, thanks to the index. This is a dramatic improvement over the 49 rows it needed to scan before the index was added. Imagine the efficiency improvement if 1 million rows are present. -
filtered: 100.00 : The
filtered
value is now 100%, which means that all of the rows MySQL examines (just 1) will match theWHERE
condition. -
possible_keys: idx_email and key: idx_email : This shows that the index on the
email
column (idx_email
) is available and is being used to optimize the query.
The Impact of Indexing
This example demonstrates how indexing can transform query performance by reducing the number of rows the database needs to search through. While indexes speed up read operations, it’s important to remember that they come with a tradeoff in terms of additional storage and slower write operations (e.g., INSERT
, UPDATE
, DELETE
) since the index needs to be updated as well.
Best Practices for Indexing
-
Choose Columns Wisely : Only create indexes on columns that are frequently used in queries. Avoid indexing columns that rarely appear in your WHERE or JOIN clauses.
-
Monitor Index Performance : Use tools like
EXPLAIN
and other database tools to evaluate index performance and make adjustments as needed for optimal query efficiency and minimal overhead. -
Avoid Over-Indexing : While indexes can improve performance, having too many can degrade it by increasing storage requirements and slowing down write operations.
Conclusion
Indexing is a powerful tool in database design that can significantly boost your query performance. When used right, indexes make your SQL queries run much more efficiently. But it’s not just about slapping on indexes wherever you can—it’s important to use them wisely to avoid slowing down other operations. By learning how and when to use indexing, you’ll keep your database running smoothly and efficiently as your app grows.