11

I am working on implementing an API interface for my project.

As i know, there are different forms to make pagination through the results, like the following:

https://example.com/api/purchaseorders?page=2&pagesize=25  

But, i see many APIs like google use a different approach, in which they use a "pageToken" to let the user move between the pages of results, for example:

https://example.com/api/purchaseorders?pagesize=25&pageToken=ClkKHgoRc291cmNlX2NyZWF0ZWRfYXQSCQjA67Si5sr

So instead of page=2 they used pageToken=[token].

It is not clear for me the idea of pageToken and how to implement it!

It will be helpful if you guide me to any resources so i can get more knowledge.

Thank you.

Karim Harazin
  • 219
  • 1
  • 2
  • 4

2 Answers2

12

The page token approach that you are talking about here is more popularly known as the 'cursor-based-pagination' and depends upon marking the record where the last request left. It can be composed of either a single entity such as an id or composed of more than one entity such as "id+name" hashed together in case the records were requested in a sorted order. This is majorly used when the records are mutable between two such pagination requests. You can get a much more detailed information about it here: https://mixmax.com/blog/api-paging-built-the-right-way

6harat
  • 243
  • 2
  • 4
9

Note: This is purely speculation, but I had to implement something similar before.

Consider a large time-series dataset stored in a big table like HBase. The data doesn't have a constant frequency so you can't make predictions about the density of the data. You want to get page 5000 with page size 25. Since you can't make predictions about the density of the data there isn't really a way to implement this without starting from the beginning and scanning 5000 * 25 rows and then collect the next 25. That's really inefficient and doesn't scale.

A better method is to use the primary key index of the table and some page size. That way I can seek right to the first row of the page and collect 25 rows. The next page will start with the key of the 26th row. It's especially useful in time-series data because typically when querying you pick a start time, and you start paging from there.

I believe Google is doing something similar. They're probably hashing the key so they don't leak implementation details and so they don't couple their interface to the structure of their table.

Samuel
  • 9,137
  • 1
  • 25
  • 42