How Keyset Pagination Improved Our Query Processing Time by 95%

Marco Bruijns - MTC Creatives
3 min readOct 31, 2024

--

Working with large datasets is a challenge many of us face in tech world. Recently, I encountered a performance bottleneck while processing a large table in batches. The setup used LIMIT OFFSET, and while it got the job done, the processing time was far from optimal. After some exploration, I discovered Keyset Pagination—a method that ended up improving our processing speed by 95%. Here’s how I did it.

Photo by Ray Hennessy on Unsplash

Our initial goal was to process records in batches of 500. We used LIMIT OFFSET to paginate the data, fetching rows in increments. However, as the offset value increased with each batch, the query times grew longer and longer. This is due to how LIMIT OFFSET operates: even though you request rows starting from a certain point, the database must still scan through the preceding rows to reach the offset. For large tables, this can quickly get expensive in terms of time and resources.

To diagnose the slowdown, I looked at how LIMIT OFFSET works in detail. For each batch, the database scans through previous rows to reach the target offset. As the offset grows, so does the scan time, leading to progressively slower queries. It was clear that while LIMIT OFFSET is simple to implement, it wasn’t suitable for tables with millions of rows.

After researching alternatives, I came across Keyset Pagination. Instead of using an arbitrary offset, Keyset Pagination relies on the values of a unique column (such as an ID or timestamp) to track the “position” in the dataset. This way, the database doesn’t have to scan preceding rows — it can jump directly to the next set, significantly improving query efficiency.

How it works:

Here’s how I implemented Keyset Pagination in our system:

  • Select a Unique Column: We chose the primary key (id), which is indexed and unique.
  • Query with a WHERE Clause: Instead of using LIMIT OFFSET, each batch request fetches rows with id values greater than the last processed id. For example:
SELECT * FROM large_table WHERE id > :last_id LIMIT 500;
  • Adjust the Batch Pointer: After each batch, update last_id with the highest id from the previous batch. This keeps each batch request efficient.

After switching to Keyset Pagination, query processing times dropped dramatically. The database no longer needed to scan previous rows, which cut down processing time by 95%. What initially took minutes per batch was now completed in seconds.

Switching to Keyset Pagination proved a game-changer for our batch processing. Here’s what I learned:

  • Understand Your Query Limitations: LIMIT OFFSET is straightforward but may not scale well with larger tables.
  • Choose the Right Column for Pagination: The key to efficient Keyset Pagination is picking a unique, indexed column.
  • Measure Your Gains: Benchmarking performance before and after changes is critical to quantify improvements.

By implementing Keyset Pagination, we transformed our data processing efficiency and made a big impact on our overall query performance. If you’re working with large tables, Keyset Pagination could be the solution you’re looking for.

Conclusion:

Optimizations like Keyset Pagination are powerful but often overlooked. The key is understanding when traditional methods, like LIMIT OFFSET, fall short and finding alternatives that meet the unique needs of your dataset.

--

--

Marco Bruijns - MTC Creatives
Marco Bruijns - MTC Creatives

Written by Marco Bruijns - MTC Creatives

Software Engineer and MTC Creatives founder, bridging business and tech with scalable data solutions. Passionate about efficiency, AI, and real-world insights.

Responses (2)