Paging Results with DocumentDB

Several months back I started experimenting with Cosmos DB in Azure - specifically with DocumentDB.  DocumentDB is Microsoft's enterprise-ready, planet-scale, blah blah blah NoSQL database.  After a small learning curve, I was up and running in very little time and I was quite impressed with the results.  If you set things up right and think about how your queries are going to be run ahead of time, it is actually quite affordable too.  What really attracted me to DocumentDB was being able to store big data in a way that didn't require me to index the data by storing it multiple times like I have to do with plain Table Storage, as I wrote about earlier.  With DocumentDB I can store the data one time and query it many different ways (within limits).  It's worked out very well for me as I've been building a massive scale exception based reporting tool for my current company.

One thing that soon bothered me though is that I found out that DocumentDB doesn't truly support paging.  In other words there is no Skip operation, only Take.  There is a highly voted feature request for Skip but it's been "planned" for over a year now so I wouldn't hold my breath.

So that left me in a bind because one of the features in my app is to allow users to run really BIG queries and to be able to view the results quickly (just as most users want to do).  However, without a Skip operator it became a bit tricky to implement paging functionality in the search results.

Fortunately, DocumentDB does allow you to grab a "page" of results and if there are more results to be taken, it will return to you a continuation token.  It's a forward-only mechanism though, you can't use a continuation token to ask for a previous result set.

So here's what I ended up having to do:

In my DocumentDB repository, I have a method called QueryAndContinue.  One of its parameters is a continuationToken which if null, will just create a new "fresh" query.  But if it has a value then it will continue with the next page of results from the previous query.  It looks like this (BTW, IDocumentEntity is just a simple interface with two properties - EntityId and PartitionKey):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
public async Task<Tuple<string, IList<T>>> QueryAndContinue<T>(
    string continuationToken, 
    int take,
    string filter, 
    string specificFields = null, 
    string orderBy = null, 
    string partitionKey = null) where T : IDocumentEntity, new()
{
    if (specificFields == null)
        specificFields = "*";

    var queryOptions = new FeedOptions { MaxItemCount = take };
    if (string.IsNullOrEmpty(partitionKey))
        queryOptions.EnableCrossPartitionQuery = true;
    else
        queryOptions.PartitionKey = new PartitionKey(partitionKey);

    var link = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);
    var query = $"SELECT {specificFields} FROM {CollectionName} a WHERE {filter}";
    if (!string.IsNullOrEmpty(orderBy))
        query += $" ORDER BY {orderBy}";

    Trace.TraceInformation($"Running query: {query}");

    if (!string.IsNullOrEmpty(continuationToken))
    {
        queryOptions.RequestContinuation = continuationToken;
    }
    var dquery =
            _client.CreateDocumentQuery<T>(link, query, queryOptions)
            .AsDocumentQuery();

    string queryContinuationToken = null;
    var page                      = await dquery.ExecuteNextAsync<T>();
    var queryResult               = page.ToList();
    if (dquery.HasMoreResults)
        queryContinuationToken = page.ResponseContinuation;

    return new Tuple<string, IList<T>>(queryContinuationToken, queryResult);
}

The important things to note in the code above are setting MaxItemCount (to your page size - in my case it's 10) and to return the page.ResponseContinuation (which is what I've been calling a continuationToken).  And, yes, I was lazy and returned a Tuple instead of creating a new class for the return type.  Oh well.

My web API layer is nothing special.  In fact I'm not even going to show you the code so that I can focus on the important part.  All the API layer does is return the page of results and the continuation token to the frontend client (written in AngularJS).  Here are a few things about the frontend:

First, I had already created an Angular directive in my app solely for the purpose of paging through results.  However, when I created it, I did so with the assumption that I would always know the total number of items in the resultset ahead of time.  With DocumentDB this was no longer true so I modified the directive to look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
.directive('tablePager', function () {
    return {
        scope: {
            data: '=',
            pageChanged: '&',
            isIndeterminate: '='
        },
        link: {
            pre: function (scope, element, attrs) {
                scope.pages = [];
                scope.pageCount = 0;
                scope.pageStartIndex = 0;
                scope.pageEndIndex = 0;
                scope.hasMoreItems = false;
                scope.$watchGroup(["data.itemCount", "data.currentPage", "data.hasMoreItems"], function (newValue, oldValue) {
                    var p = Math.ceil(scope.data.itemCount / scope.data.pageSize);
                    scope.hasMoreItems = scope.data.hasMoreItems;
                    if (scope.hasMoreItems)
                        p += 1;
                    scope.pageCount = p;
                    pages = [];
                    if (p <= 5) {
                        for (var i = 0; i < p; i++) {
                            pages.push(
                                { index: i + 1 }
                            );
                        }
                    }
                    else {
                        var start = Math.max(scope.data.currentPage - 3, 0);
                        for (var i = start; i < start + 5; i++) {
                            if (i >= p)
                                break;
                            pages.push(
                                { index: i + 1 }
                            );
                        }
                    }
                    scope.pageStartIndex = scope.data.pageSize  (scope.data.currentPage - 1) + 1;
                    scope.pageEndIndex = Math.min(scope.data.itemCount, scope.data.pageSize  scope.data.currentPage)
                    scope.pages = pages;
                });
            }
        },
        templateUrl: '/scripts/templates/table-pager.html'
    }
})

The important pieces here is that if hasMoreItems is true, which will be the case when the continuation token is not null, then it sets the page count to the current page count + 1.  This way my UI shows that there is at least one more page of results to be had.  The template for this directive looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
<div ng-show="!isIndeterminate" class="m-l-1 m-t-1 font-size-12">{{pageStartIndex}}-{{pageEndIndex}} of {{data.itemCount}}</div>
<div ng-show="isIndeterminate" class="m-l-1 m-t-1 font-size-12">{{pageStartIndex}}-{{pageEndIndex}}</div>
<ul class="pagination" ng-show="pageCount > 1">
    <li ng-class="{'disabled': data.currentPage === 1}" class="hidden-xs">
        <a href="#" ng-click="pageChanged({e: $event, pageNumber: 1})"><i class="font-size-11 fa fa-angle-double-left"></i></a>
    </li>
    <li ng-class="{'disabled': data.currentPage === 1}"><a href="#" ng-click="pageChanged({e: $event, pageNumber: data.currentPage - 1})"><i class="font-size-11 fa fa-angle-left"></i></a></li>
    <li ng-repeat="page in pages" ng-class="{'active': data.currentPage === page.index }">
        <a href="#" ng-click="pageChanged({e: $event, pageNumber: page.index})">{{page.index}}</a>
    </li>
    <li ng-class="{'disabled': data.currentPage === pageCount}"><a href="#" ng-click="pageChanged({e: $event, pageNumber: data.currentPage + 1})"><i class="font-size-11 fa fa-angle-right"></i></a></li>
    <li ng-class="{'disabled': data.currentPage === pageCount}" class="hidden-xs">
        <a href="#" ng-click="pageChanged({e: $event, pageNumber: pageCount})"><i class="font-size-11 fa fa-angle-double-right"></i></a>
    </li>
</ul>

For our DocumentDB pager, the isIndeterminate flag is set to true.  The rest of it should be pretty self-explanatory; nothing too fancy as far as paging controls go.  Now back in the controller I have code to control the paging behavior:

The html looks like this:

1
2
3
<nav table-pager data="model.paging" 
page-changed="pagerChanged(e, pageNumber)" 
is-indeterminate="model.isFullSearch"></nav>

And finally, in the controller:

1
2
3
4
5
6
7
8
9
$scope.pagerChanged = function (e, pageNumber) {
    e.preventDefault();

    if (pageNumber * $scope.model.paging.pageSize > $scope.model.data.length && $scope.model.paging.hasMoreItems) {
        $rootScope.$broadcast("fullsearch:getnextpage", $scope.model.continuationToken);
    }

    $scope.model.paging.currentPage = pageNumber;
};

The important thing to note in the code above is that if the calculated result length (pageNumber * pageSzie) is going to be bigger than the number of items we have already downloaded (represented by $scope.model.data), then we broadcast an event to go get a new page of results using the continuationToken we saved from the previous query.  If we already have the page of results requested by the user (i.e. they hit the back button on the pager) then we just simply change the pageNumber and let the pager directive do it's thing like it normally would.

Putting it all together it looks like this:

As I freely admit, this solution does have one major drawback, and that is the pager only shows the next available page.  This could be solved by first running a query to get a count of the documents that match the query.  DocumentDB recently added aggregate functions so getting a count of all the documents that match could be relatively easy.  However, you do suffer a performance hit doing it that way since you'll have to run two separate queries.  And depending on how many documents match the query, event the count function could take a while to complete.  Since I support sorting in my application, I decided that skipping to a random page number was not a feature that would be worth spending the time on to develop.

If you are interested in finding out more about how I am able to help you implement big data solutions, like DocumentDB, in your own products - please reach out to me!