Optimizing SQL Queries: How We Reduced API Latency by 60%

At CodingClave, we frequently encounter systems where the application layer is horizontally scalable, but the persistence layer has become a single point of failure. The database is the physics of your architecture; you cannot cheat gravity.

Recently, we audited a high-frequency fintech ledger handling approximately 4,000 writes per second. Their primary "Transaction History" endpoint was suffering from a p99 latency of 1.2 seconds, causing timeouts in downstream microservices and saturating the connection pool.

This post details the specific steps we took to analyze the query planner, refactor the schema strategy, and optimize the SQL execution, resulting in a 60% reduction in API latency and a 40% drop in database CPU utilization.

The High-Stakes Problem

The endpoint in question filtered transactions based on user_id, transaction_type, and a created_at date range, while implementing pagination.

The application used an ORM that generated a query functionally equivalent to this:

SELECT *
FROM transactions
WHERE user_id = 'u_847291'
  AND transaction_type = 'DEBIT'
  AND created_at > '2024-11-01 00:00:00'
ORDER BY created_at DESC
LIMIT 50 OFFSET 10000;

While this looks innocuous, the transactions table held 850 million rows. The existing index was a simple B-Tree on user_id.

When we ran EXPLAIN (ANALYZE, BUFFERS), the root cause was immediately visible:

Index Scan followed by Filter: Postgres used the index on user_id, but then had to visit the heap (the actual table storage) for every single row matching that ID to check the transaction_type and created_at constraints.
High I/O: The "Filter" step resulted in excessive random I/O operations because the rows for a specific user were scattered across the disk (poor data locality).
Offset Punishment: The database had to materialize 10,050 rows, discard the first 10,000, and return the last 50.

The query execution time was 850ms on the database side alone.

Technical Deep Dive: The Solution

We implemented a two-pronged solution: Composite Indexing for direct lookups and Keyset Pagination to eliminate the offset drift.

1. Composite "Covering" Indexes

We needed to allow the database to satisfy the WHERE clause entirely from the index structure without touching the heap. We introduced a composite index ordered by cardinality (selectivity).

-- Migration Step: Create composite index concurrently to avoid table locking
CREATE INDEX CONCURRENTLY idx_transactions_optimization
ON transactions (user_id, transaction_type, created_at DESC);

Why this works:

user_id: High cardinality. Narrows the search space immediately.
transaction_type: Low cardinality, but included to prevent heap lookups.
created_at DESC: Matches the sort order required by the query.

By adding these columns to the index, we converted the operation from an Index Scan (requires heap access) to an Index Only Scan (data retrieved purely from the B-Tree).

2. Implementing Keyset Pagination (Seeking)

OFFSET is O(N). As a user pages deeper into their history, the database does linearly more work. We refactored the API to use "cursor-based" or "keyset" pagination. Instead of skipping rows, we seek the last seen record.

Refactored SQL:

SELECT id, amount, currency, created_at
FROM transactions
WHERE user_id = 'u_847291'
  AND transaction_type = 'DEBIT'
  -- The Cursor Logic
  AND (created_at, id) < ('2024-11-29 10:30:00', 'tx_99999')
ORDER BY created_at DESC, id DESC
LIMIT 50;

Note: We include id in the tuple comparison to handle timestamp collisions deterministically.

The Resulting Execution Plan

After these changes, the EXPLAIN ANALYZE output shifted dramatically:

Limit  (cost=0.57..8.59 rows=50 width=128) (actual time=0.045..0.092 rows=50 loops=1)
  ->  Index Only Scan using idx_transactions_optimization on transactions  (cost=0.57..34200.12 rows=215000 width=128)
        Index Cond: ((user_id = 'u_847291'::text) AND (transaction_type = 'DEBIT'::text))
        Filter: ((created_at, id) < ('2024-11-29 10:30:00'::timestamp without time zone, 'tx_99999'::text))
        Heap Fetches: 0
Planning Time: 0.12 ms
Execution Time: 0.11 ms

Execution time dropped from 850ms to 0.11ms.

Architecture & Performance Benefits

By optimizing the data access path, the benefits cascaded through the entire stack:

API Latency: The p99 latency for the endpoint dropped from 1.2s to 380ms (accounting for network and serialization overhead).
Throughput Capacity: With queries finishing 100x faster, database connections were returned to the pool almost instantly. This allowed the existing hardware to handle a 3x increase in concurrent users without degradation.
IOPS Reduction: By leveraging Index Only Scans, we reduced disk read operations by 95%, significantly lowering the billable IOPS on the cloud provider.

How CodingClave Can Help

While the solution outlined above is effective, implementing 'Optimizing SQL Queries: How We Reduced API Latency by 60%' in a live production environment is fraught with risk.

Altering indexes on multi-terabyte tables can lock your database, causing total service outages. Refactoring pagination logic requires precise coordination between frontend, backend, and data teams to ensure data consistency and prevent "missing records" bugs.

At CodingClave, we specialize in high-scale architecture and database performance engineering. We don't just write SQL; we execute zero-downtime migrations, audit query planners for edge cases, and re-architect data layers to withstand massive scale.

If your system is facing latency bottlenecks or you are hitting the limits of your current database architecture, do not guess at the solution.

Book a Technical Roadmap Consultation with CodingClave today. Let us turn your technical debt into a competitive advantage.

The High-Stakes Problem

The endpoint in question filtered transactions based on user_id, transaction_type, and a created_at date range, while implementing pagination.

The application used an ORM that generated a query functionally equivalent to this:

SELECT *
FROM transactions
WHERE user_id = 'u_847291'
  AND transaction_type = 'DEBIT'
  AND created_at > '2024-11-01 00:00:00'
ORDER BY created_at DESC
LIMIT 50 OFFSET 10000;

While this looks innocuous, the transactions table held 850 million rows. The existing index was a simple B-Tree on user_id.

When we ran EXPLAIN (ANALYZE, BUFFERS), the root cause was immediately visible:

Index Scan followed by Filter: Postgres used the index on user_id, but then had to visit the heap (the actual table storage) for every single row matching that ID to check the transaction_type and created_at constraints.
High I/O: The "Filter" step resulted in excessive random I/O operations because the rows for a specific user were scattered across the disk (poor data locality).
Offset Punishment: The database had to materialize 10,050 rows, discard the first 10,000, and return the last 50.

The query execution time was 850ms on the database side alone.

Technical Deep Dive: The Solution

We implemented a two-pronged solution: Composite Indexing for direct lookups and Keyset Pagination to eliminate the offset drift.

1. Composite "Covering" Indexes

We needed to allow the database to satisfy the WHERE clause entirely from the index structure without touching the heap. We introduced a composite index ordered by cardinality (selectivity).

-- Migration Step: Create composite index concurrently to avoid table locking
CREATE INDEX CONCURRENTLY idx_transactions_optimization
ON transactions (user_id, transaction_type, created_at DESC);

Why this works:

user_id: High cardinality. Narrows the search space immediately.
transaction_type: Low cardinality, but included to prevent heap lookups.
created_at DESC: Matches the sort order required by the query.

By adding these columns to the index, we converted the operation from an Index Scan (requires heap access) to an Index Only Scan (data retrieved purely from the B-Tree).

2. Implementing Keyset Pagination (Seeking)

Refactored SQL:

SELECT id, amount, currency, created_at
FROM transactions
WHERE user_id = 'u_847291'
  AND transaction_type = 'DEBIT'
  -- The Cursor Logic
  AND (created_at, id) < ('2024-11-29 10:30:00', 'tx_99999')
ORDER BY created_at DESC, id DESC
LIMIT 50;

Note: We include id in the tuple comparison to handle timestamp collisions deterministically.

The Resulting Execution Plan

After these changes, the EXPLAIN ANALYZE output shifted dramatically:

Limit  (cost=0.57..8.59 rows=50 width=128) (actual time=0.045..0.092 rows=50 loops=1)
  ->  Index Only Scan using idx_transactions_optimization on transactions  (cost=0.57..34200.12 rows=215000 width=128)
        Index Cond: ((user_id = 'u_847291'::text) AND (transaction_type = 'DEBIT'::text))
        Filter: ((created_at, id) < ('2024-11-29 10:30:00'::timestamp without time zone, 'tx_99999'::text))
        Heap Fetches: 0
Planning Time: 0.12 ms
Execution Time: 0.11 ms

Execution time dropped from 850ms to 0.11ms.

Architecture & Performance Benefits

By optimizing the data access path, the benefits cascaded through the entire stack:

API Latency: The p99 latency for the endpoint dropped from 1.2s to 380ms (accounting for network and serialization overhead).
Throughput Capacity: With queries finishing 100x faster, database connections were returned to the pool almost instantly. This allowed the existing hardware to handle a 3x increase in concurrent users without degradation.
IOPS Reduction: By leveraging Index Only Scans, we reduced disk read operations by 95%, significantly lowering the billable IOPS on the cloud provider.

How CodingClave Can Help

While the solution outlined above is effective, implementing 'Optimizing SQL Queries: How We Reduced API Latency by 60%' in a live production environment is fraught with risk.

If your system is facing latency bottlenecks or you are hitting the limits of your current database architecture, do not guess at the solution.

Book a Technical Roadmap Consultation with CodingClave today. Let us turn your technical debt into a competitive advantage.

Optimizing SQL Queries: How We Reduced API Latency by 60%

The High-Stakes Problem

Technical Deep Dive: The Solution

1. Composite "Covering" Indexes

The Resulting Execution Plan

Architecture & Performance Benefits

How CodingClave Can Help

Let's build your next product together.

Optimizing SQL Queries: How We Reduced API Latency by 60%

The High-Stakes Problem

Technical Deep Dive: The Solution

1. Composite "Covering" Indexes

The Resulting Execution Plan

Architecture & Performance Benefits

How CodingClave Can Help

Let's build your next product together.