Understanding Mariadb Indexes: A Battle-tested Engineer's Guide

After spending the better part of a decade debugging slow queries at 3 AM and watching developers scratch their heads over execution plans, I've learned that indexes are like that reliable friend who always knows where everything is – except when they don't, and then everything goes sideways.
Let me share what I've learned about how MariaDB indexes actually work, because understanding them isn't just about performance – it's about maintaining your sanity during production incidents.
What Are Indexes Really?
Think of an index like the card catalog in an old library (if you're young enough to have never seen one, imagine a very organized bookmark system). Instead of scanning every single book to find "JavaScript: The Good Parts," you check the catalog, get the exact location, and walk straight to it.
In MariaDB, an index is a separate data structure that maintains pointers to the actual table rows, sorted by the indexed column values. When you query WHERE user_id = 12345
, the database can jump straight to the relevant rows instead of scanning millions of records.
The Different Flavors of Indexes
B-Tree Indexes (The Workhorses)
B-Tree indexes are the default and most common type in MariaDB. They're balanced trees where data is stored in sorted order, making range queries lightning fast.
CREATE INDEX idx_created_at ON orders (created_at);
When they shine:
- Equality searches:
WHERE id = 123
- Range queries:
WHERE created_at BETWEEN '2023-01-01' AND '2023-12-31'
- ORDER BY operations
- GROUP BY operations
Real-world gotcha: I once spent hours optimizing a query that was inexplicably slow, only to discover someone had created an index on LOWER(email)
but the query was using email
directly. The index was there, just useless for that query.
Hash Indexes (The Speed Demons)
Hash indexes use a hash function to map values to bucket locations. They're incredibly fast for exact matches but useless for anything else.
CREATE INDEX idx_hash_user_id ON sessions (user_id) USING HASH;
Perfect for:
- Exact equality:
WHERE user_id = 123
- High-cardinality columns with mostly equality searches
Avoid for:
- Range queries (they'll be ignored)
- Pattern matching
- Sorting operations
Composite Indexes (The Multi-Tools)
These index multiple columns together. The order matters tremendously – it's like a phone book sorted by last name, then first name.
CREATE INDEX idx_user_status_created ON orders (user_id, status, created_at);
This index can efficiently handle:
WHERE user_id = 123
WHERE user_id = 123 AND status = 'completed'
WHERE user_id = 123 AND status = 'completed' AND created_at > '2023-01-01'
But it's much less efficient for:
-
WHERE status = 'completed'
(second column only) -
WHERE created_at > '2023-01-01'
(third column only)
How MariaDB Uses Indexes Internally
The Query Optimizer's Decision Process
When you execute a query, MariaDB's optimizer goes through what I like to call "the great index evaluation." It looks at:
- Available indexes on the queried tables
- Cardinality (how unique the values are)
- Statistics about data distribution
- Query selectivity (how many rows it expects to return)
Here's a simplified version of what happens:
EXPLAIN SELECT * FROM users WHERE age = 25 AND city = 'New York';
The optimizer might choose between:
- Full table scan
- Index on
age
- Index on
city
- Composite index on
(city, age)
- Index merge using both single-column indexes
Index Scans vs. Table Scans
An index scan reads the index structure to find matching rows, then fetches the actual data. A table scan reads every row in the table.
The tipping point? Usually around 10-30% of the table. If your query needs 40% of the rows, MariaDB might skip the index entirely because reading the whole table sequentially is faster than bouncing between index and table for millions of rows.
Real-World Index Performance Patterns
The "Everything Is Slow" Syndrome
I've seen this pattern dozens of times: application performance gradually degrades over months. The culprit? Missing indexes on growing tables combined with increasingly complex queries.
Symptoms:
- Queries that were fast with 10K rows are crawling with 1M+ rows
- High CPU usage during business hours
- Users complaining about "the app being slow"
Solution approach:
- Use
SHOW PROCESSLIST
to find running queries -
EXPLAIN
the slow ones - Look for table scans on large tables
- Add appropriate indexes
The Over-Indexing Problem
More indexes doesn't always mean better performance. I've audited databases with 15+ indexes on a single table where only 3 were ever used.
The hidden costs:
- Every INSERT/UPDATE/DELETE maintains all indexes
- More storage space
- Longer backup times
- Index maintenance overhead
My rule of thumb: If an index isn't used by any query in your slow query log over a month, consider dropping it.
Practical Index Optimization Tips
1. Use EXPLAIN to Your Advantage
EXPLAIN FORMAT=JSON SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending'
ORDER BY created_at;
Look for:
-
"type": "ALL"
(table scan - usually bad) -
"Extra": "Using filesort"
(sorting without index) -
"Extra": "Using temporary"
(creating temp table)
2. The Covering Index Trick
A covering index includes all columns needed by a query:
-- Query only needs id, customer_id, status, total
CREATE INDEX idx_covering ON orders (customer_id, status, id, total);
This allows the query to run entirely from the index without touching the table data – what we call an "index-only scan."
3. Prefix Indexes for String Columns
For long string columns, you often don't need to index the entire value:
-- Index only the first 10 characters
CREATE INDEX idx_email_prefix ON users (email(10));
This works well for email domains, URLs, or any string where the prefix has good selectivity.
Common Indexing Mistakes (And How to Avoid Them)
1. The Function-Wrapped Column
-- Bad: Index on created_at won't be used
WHERE YEAR(created_at) = 2023
-- Good: Rewrite to use the index
WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'
2. Leading Wildcards in LIKE
-- Index won't help
WHERE email LIKE '%@gmail.com'
-- Index can help
WHERE email LIKE 'john%'
3. Mismatched Data Types
-- If user_id is INT, this won't use the index
WHERE user_id = '123'
-- This will
WHERE user_id = 123
Monitoring Index Health
Modern MariaDB provides excellent tools for index analysis:
-- Find unused indexes
SELECT * FROM information_schema.INDEX_STATISTICS
WHERE CARDINALITY = 0;
-- Check index cardinality
SHOW INDEX FROM your_table;
-- Analyze index usage
SELECT * FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA = 'your_database';
The Bottom Line
Indexes are powerful tools, but like any tool, they need to be used wisely. My advice after years of production database management:
- Start simple – add indexes based on actual query patterns, not hypothetical ones
- Monitor actively – use slow query logs and performance schema
- Test thoroughly – index changes can have unexpected effects
- Clean house regularly – remove unused indexes periodically
Remember, the best index strategy is one that evolves with your application. What works for 100K rows might not work for 100M rows, and what works for your MVP might need rethinking as your queries become more complex.
The goal isn't to have the most indexes – it's to have the right indexes for your specific workload. And occasionally, the right answer is no index at all, just better hardware or query rewriting.
Happy indexing, and may your queries always return in milliseconds!
Have you encountered interesting indexing challenges in MariaDB? Share your war stories in the comments below.