This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Queries

Queries are actions against a collection of resources, which allow your users to filter, sort, and aggregate data in a way that is meaningful to them. This section of the contract defines the structure of these queries, including Filtering, Searching, Pagination, Sorting, and Aggregation.

1 - Pagination and Sorting

Requirements around paging, sorting, and handling stale result sets.

Pagination

There are two ways to paginate a result set: Next/Previous, and Offset/Limit. We understand that most User Interfaces strongly prefer an offset/limit style, as it is more intuitive to human users. For the resource server, this provides a technical challenge with scale, as many data stores (most notably document-based stores) will not know how many documents match any given set of criteria until they have traversed the entire result set.

That’s not to say they can’t be used. For our purposes, however, we cannot create requirements in this contract which themselves will not grow with the scale of the data. Falling back to use cases, it is (from experience) quite rare for a human to page deeply into a result set, while a script loading all results is quite common. As such, we will only support the Next/Previous style of pagination, optimizing for the most frequent use case.

Please note that a proper implementation of Aggregation queries can easily provide the necessary metadata to simulate offset/limit style pagination, if that is a requirement for your use case. It is the responsibility of the client to decide which method is most appropriate for their audience.

Pagination Example

Here, we are submitting a query that will return the first 100 results of a resource.

POST /v1/resource/query HTTP/1.1

{
    "start": "....",  // Optional ID of the first record to return.
    "limit": 100,     // The number of results to return, default 100

    // ... Sort and filter parameters, as appropriate for the query.
}

As there are more than 100 results, the response will include a Links header with a next link to the next page of results, as per RFC-5988. It also includes an ETag - calculated from the content, and a Last-Modified header, indicating the last-modified record in this set.

HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"
Last-Modified: "Wed, 21 Oct 2015 07:28:00 GMT"
Links: <https://api.example.com/v1/resource/query?start=....&limit=100>; rel="next"

{
    ... results as per the query type
}
PropertyRelevanceTypeDescription
startrequeststringAn optional start index from which the result should be read. This must be the ID of the first record of the result set.
limitrequestintAn optional number of results to return in the page, with a default of 100.

Sorting

Every resource must choose a human-relevant, intuitive dimension to use as a default sort. For example, a Report service might choose to sort by name, while a Security Violations service may sort by severity or age. An API consumer may then choose to use their own dimensions. They are expressed in order, as below.

Sorting Example

{
  "sort": [ // Note that the 'sort' field is optional.
    {
      "on": "age",         // The resource property to sort on.
      "order": "ASC|DESC"  // "ASC" or "DESC", representing ascending or descending sorts. Default is "ASC".
    },
    {                      // A second sort dimension, after the first is applied.
      "on": "name",
      "order": "ASC|DESC"
    }
  ]
}

Sorting inherently conflicts with searching; Searching provides its own implicit ordering by relevance, which would be overridden by sort. Therefore, any request that includes both a search and a sort must return a 400 response indicating that they are not compatible. For “Search and sort” style operations, please use wildcards in a filter.

2 - Searching and Filtering

How to search, and construct complex filters for any query.

Searching is an inherently inclusive operation, whereby the system will add all records whose values closely - but not exactly - match the search string. Filtering, by contrast, constrains the set of records to those that exactly match the provided values, though they may support wildcards. They can operate in tandem, with a filter constraining the field in which a search is applied, but care must be taken that the system can handle the complexity of the combined operation.

Searching

A search string is provided by a user when they are not entirely certain where the result they are seeking is expressed. For example, a word expressed in a search string may exist in a title, a description, or any other property of the resource.

POST /v1/resources/query HTTP/1.1

{
  "search": "...."
}
PropertyRelevanceTypeDescription
searchrequeststringA string by which to search in the set of resources.

It is left to each resource type to define which fields are included in the search index, and what tokenization method is used to decompose the resource instance and the incoming search expression. In all cases, the result should be a ' best fit’ match, should be case-insensitive, and should be returned in the order of most relevant first.

Filter

A query may include a filter object, which expresses a tree-like structure of logical filters and their relevant operands. There are two basic types of filter objects: single and multiple. If a search string is also provided, it must only be applied to resources that also match the filters.

If a search string is provided, and the query also accepts sorting criteria, the service must return a 400 Bad Request stating that search and sort are not compatible. Searching already includes an implicit sort based on the relevance of each record, which a sort expression would conflict with. These two expressions are not compatible.

Single Value Operation

Filtering for a single value on a single field looks as follows:

POST /v1/resources/query HTTP/1.1

{
  "filters": {
        "op": "....",           // the logical operation that applies to a single value (see below)
        "key": "dot.notation",  // The key where the value should be found, using dot-notation for deep nesting.
        "value": "some-value"   // The input value of for the logical operation, if appropriate.
    }
}

Multi-Value Operation

Filtering on multiple criteria would expand on the above.

POST /v1/resources/query HTTP/1.1

{
  "filters": {
        "op": "....",           // the logical operation that applies to multiple values.
        "values": [
            {
                // A list of single or multi-value operations.
            }
        ]
    }
}

Data Schema

KeyTypeRelevanceDescription
opstringAllThe operation to perform. Case insensitive, see below for a full list of required operations. If not provided, the default value is assumed to be EQ for single values, and OR for multi values.
keystringSingle ValueThe property on which to perform the operation, which may include dot-notation.
valuestringSingle ValueA value to use for single-value operations.
valuesFilter ArrayMulti ValueA list of operations. If an empty array is included, no records should match.

The value property is always a string, though its format may be type specific:

  • Dates must be formatted as RFC-3339.
  • String values may include the wildcard *, which represents zero or more of any character.
  • Large numbers (big.Int) must be expressed as base64 encoded strings.
  • Regular Expressions must not include leading and trailing slashes.

Valid Operations

OperationKey NameRelevanceNotes
EQStrictly EqualsSingle Value
NEQNot EqualsSingle Value
GTGreater ThanSingle Value
LTLess ThanSingle Value
GEGreater or Equal ToSingle Value
LELess than or Equal ToSingle Value
REGEXRegular ExpressionSingle Value
ANDAndMulti ValueAll of the provided filters must be true.
OROrMulti ValueAny of the provided filters must be true.
XORExclusive orMulti ValueOnly one of the provided filters may be true.
XNORAll or nothingMulti ValueAll of the provided filters must be true, or false.

Wildcards

The use of wildcards may be used in string values, using simple Glob matching. For more complex queries, use the REGEX operation.

Wildcard CharacterOperation
*One or more of any character.
?Any single character.

Examples

String Equality

{
  "filters": {
    "op": "OR",
    "values": [
      // EQ is the default
      {
        "key": "name",
        "value": "some_value"
      },
      {
        "key": "name",
        "value": "some_other_value"
      }
    ]
  }
}

Date Range

{
  "filters": {
    "op": "AND",
    "values": [
      {
        "op": "GT",
        "key": "createdDate",
        "value": "1985-04-12T00:00:00Z"
      },
      {
        "op": "LE",
        "key": "createdDate",
        "value": "1985-04-12T23:59:59Z"
      }
    ]
  }
}

3 - Projection

How to selectively reduce the size of the data returned in a result set.

Projection allows a client to specify which fields it is interested in. This permits further optimization on client queries, however this feature should only be implemented if it is business critical. It is - in the strictest sense of the term - a premature optimization.

With that in mind, a client may add either an include or an exclude list to the query. If both are present, the server should respond with a 400 Bad Request error.

PropertyRelevanceTypeDescription
includerequeststring arrayAn optional list of fields to include in response objects.
excluderequeststring arrayAn optional list of fields to exclude from response objects.
POST /v1/resources/query HTTP/1.1

{
    // Optional list of fields to include or exclude from the result objects.
    "projection": {
        // Either "include" or "exclude" must be specified!
        "include": ["fieldName1", "fieldName2"],
        "exclude": ["fieldName3"]
    }
}

4 - List Queries

Listing and searching collections of resources.

In addition to basic CRUD operations, clients frequently list or search within resource sets. This breaks down into two different use cases: that of a UI, where a list with filters and a search box are offered to a user, and that of a machine client, which is usually only interested in a full list of resources.

There are also two RESTful philosophies around resource lists. The first is “Read all resources in a collection”, which is usually implemented as a GET request. The second is “Build a result set based on a query”, which is usually implemented as a POST request. Since we are prescribing a very rich and featured query language, it becomes impractical to express all these options in the URL of a GET request, forcing us to adopt the second philosophy.

The Query Path

Since performing a POST request on the root resource is already assigned to creating a resource of that type, we require a dedicated endpoint for querying resources. For generated result sets, we also require a subresource hierarchy to allow for pagination and sorting.

POST /v1/resources/query
GET /v1/resources/query/<result_set_id>
GET /v1/resources/query/<result_set_id>/<page_id>

Requests

Our query endpoints construct their requests using the following three components:

POST /v1/resources/query HTTP/1.1

{
    // Pagination and Sorting as per that spec.
    "start": "....",
    "limit": ....,
    "sort": ....,

    // As per our Searching and Filtering spec
    "search": "...",
    "filters": ....
}

Responses

List responses are a complex topic, as they can be quite large, require sophisticated pagination, and can be time-consuming to generate. As such, we require that all list responses - regardless of implementation - at least pretend to perform background processing to build the result set.

The response to a query request may be one of two types: a direct response, or a deferred response. The direct response is the simplest, and is returned when the result set is already available. It contains the result set as described below, but must also contain the Content-Location header to indicate the actual URL of the provided result set.

HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"
Last-Modified: "Wed, 21 Oct 2015 07:28:00 GMT"
Content-Location: https://api.example.com/v1/resources/query/<result-set-identifier>/1

{
    "results": [
        ....
    ]
}

A deferred response is returned when the result set is not yet available, or if you simply want to pretend like it’s not available yet. In this case, the server should return a 201 Created response with a Location header pointing to the first page of the result set.

HTTP/1.1 201 Created
Location: https://api.example.com/v1/resources/query/<result-set-identifier>/1

Once redirected, if the page of the request is not yet ready, the server must return 202 Accepted response with an appropriate Retry-After header, and an error response body that can assist in remediation.

HTTP/1.1 202 Accepted
Retry-After: 30
Cache-Control: no-store

{
    "error": "not_ready",
    "error_description": "A text description about how much longer it might take."
}

Once the result set is ready, the server should return a 200 OK response with the result set, as well as the following headers:

  • ETag - A hash of the result set, used for caching, as described in Entity Versioning.
  • Last-Modified - The last-modified date of the most recently modified resource in the result set.
  • Cache-Control with the max-age field, to communicate to the client when a result set will be considered stale.
HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"
Last-Modified: "Wed, 21 Oct 2015 07:28:00 GMT"
Cache-Control: max-age=3600
Link: <https://api.example.com/v1/resources/query/<result-set-identifier>/2; rel="next"

{
    "results": [
        ....
    ]
}

Empty results

An empty result set should - for the first page - include a 200 OK response with an empty result set, and no Link locations.

HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"
Last-Modified: "Wed, 21 Oct 2015 07:28:00 GMT"
Cache-Control: max-age=3600

{
    "results": []
}

Access Rights Violations

Access rights violations come in three types:

  1. A request has an invalid authorization token.
  2. A request has not been granted permission to read this resource type.
  3. A request is constrained to only a limited set of the available resources.

In the first case, the service should respond with a 401 response according to our common errors specification. The second, similarly, should return 403. For all other requests, the result set should be constrained only to the resources which the user is authorized to see. Even if explicitly named in a filter, if a user cannot see that resource, the result set should be empty.

5 - Aggregation Queries

Aggregating data in a single query.

Aggregation queries are a powerful way by which aggregate data can be collected in a single query without a client having to iterate over the entire result set. For those of you familiar with ElasticSearch, this is a simplified version of the Bucket, Metrics, and Pipeline aggregation request format, which expresses the options of each while keeping the contract concise for the user (note that only Buckets and Metrics are supported).

Supporting these kinds of queries can be quite complex, and your API may not even need them, so it’s up to you to decide if they are necessary. Use cases which this might satisfy include:

  • Autocompleting tags already used in other documents.
  • Showing how many documents exist in a particular result set.
  • Gather averages, sums, and other metrics from a set of resources.

Path

Much like List Queries, aggregation queries follow the “Build a result set based on a query” pattern, but this time using the /aggregate sub-path of the resource’s endpoint. Unlike the list queries however, there is no need to page the response.

POST /v1/resources/aggregate
GET  /v1/resources/aggregate/<result_set_id>

Request and Response Schema

Aggregation queries do not support the same filtering, searching, pagination, or sorting semantics as List Queries. Filtering is applied at the top level of the request and affects all aggregations and sub-aggregations. This ensures consistency and simplifies the query structure. Filters cannot be applied to individual aggregations within the query. Sorting can be applied directly to each aggregation bucket (if appropriate).

Common Fields

Every aggregation request contains the same two fields: filters, which is optional and follows our filtering rules, and aggregations, which is a map of the different aggregations requested by the server. The aggregations each have a type property to inform the server what form of aggregation is requested.

POST /v1/resources/aggregate

{
    "aggregations": {
        "<bucket_name>": {
            "type": "terms",
            ....
        },
        "<another_bucket_name>": {
            "type": "avg",
            ....
        }
    }
}

A response to an aggregation request returns the same map, replacing the query constraints with the results of the requested aggregation. For specific examples of requests and responses, please see the detailed examples below.

HTTP/1.1 200 OK

{
    "aggregations": {
        "<bucket_name>": {
            .... results
        },
        "<another_bucket_name>": {
            .... results
        }
    }
}

Aggregation Query: terms

A ’terms’ aggregation query sorts all documents into buckets defined by the provided fields’ value, and returns the count of those buckets. The number of terms to return should be provided.

POST /v1/resources/aggregate

{
    "aggregations": {
        "tags": {
            "sort": [],            // Optional sorting rules, as per the sort standard.
            "type": "terms",       // Always `terms`
            "field": "tags.name",  // The field name to aggregate.
            "count": 20,           // The total number of terms to return.
        }
    }
}

The server then must respond with the buckets into which the documents were sorted, along with the count of documents in each bucket. Terms should be sorted according to the sorting rules.

HTTP/1.1 200 OK

{
    "aggregations": {
        "tags": {
            "tags.name": {
                "Anchovy": {        // The term name
                    "count": 3      // The number of documents which contain this term
                },
                "Sardine": {        // The term name
                    "count": 60     // The number of documents which contain this term
                }
            }
        }
    }
}

Aggregation Query: sum

A sum aggregation calculates sum of a numeric field. While itself perhaps not the most useful, it becomes quite powerful when used as a nested aggregation (see examples at the end).

POST /v1/resources/aggregate

{
    "filter": [...],
    "aggregations": {
        "award_points": {
            "type": "sum",        // Always `sum`
            "field": "points",    // The field name to calculate the sum of.
        }
    }
}

The server then must respond with the buckets into which the documents were sorted, along with the count of documents in each bucket. Terms should be sorted according to the sorting rules.

HTTP/1.1 200 OK

{
    "aggregations": {
        "award_points": {
            "count": 10332,
            "points": 234000
        }
    }
}

Aggregation Query: avg

A avg aggregation calculates the average of a numeric field.

POST /v1/resources/aggregate

{
    "filter": [...],
    "aggregations": {
        "rating": {
            "type": "avg",       // Always `avg`
            "field": "stars",    // The field name to calculate the average of.
        }
    }
}

The server then must respond with the buckets into which the documents were sorted, along with the count of documents in each bucket. Terms should be sorted according to the sorting rules.

HTTP/1.1 200 OK

{
    "aggregations": {
        "rating": {
            "count": 10332,
            "stars": 4.23322555
        }
    }
}

Aggregation Query: range

A range aggregation collects documents into numeric ranges for a specific field. The ranges are defined by the from and to properties. If either is omitted, the range is open-ended.

POST /v1/resources/aggregate

{
    "filter": [...],
    "aggregations": {
        "runners": {
            "type": "range",    // Always `range`
            "field": "pace",    // The field name to divide into buckets
            "ranges": {         // An object of pre-named bucket ranges
                "slow":   { "from": 0, "to": 7 },
                "normal": { "from": 7, "to": 9 },
                "fast":   { "from": 9 }
            }
        }
    }
}

The server then must respond with the buckets that were requested by the user.

HTTP/1.1 200 OK

{
    "aggregations": {
        "runners": {
            "count": 8700,
            "pace": {
                "slow": {
                    "count": 100
                },
                "normal": {
                    "count": 8000
                },
                "fast": {
                    "count": 600
                }
            }
        }
    }
}

Other Aggregation Queries

The above are not an exhaustive list of aggregations which your system may support; your use cases may vary, and you can expand on what we’ve provided here at your leisure. We just ask that you let us know of specific use cases, so we can evaluate them for inclusion here.

If you’re looking for inspiration, the ElasticSearch Aggregations documentation can provide some.

Examples

I’ve included some examples below to help you understand how to structure your requests and what to expect in the response.

Items by price

POST /v1/resources/aggregate HTTP/1.1

{
    "aggregations": {
        "price_range": {
            "type": "range",
            "field": "price",
            "ranges": [
                { "from": 0, "to": 50 },
                { "from": 50, "to": 100 },
                { "from": 100 }
            ]
        }
    }
}
HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"

{
    "aggregations": {
        "price_range": {
            "0-50": {
                "doc_count": 100
            },
            "50-100": {
                "count": 80
            },
            "100+": {
                "count": 60
            }
        }
    }
}

Autocompleting Tag Names


POST /v1/resources/aggregate HTTP/1.1

{
    "aggregations": {
        "tags": {
            "type": "terms",
            "field": "tags.name",
            "count": 5,
            "sort": [
              {
                "on": "tags.name",
                "order": "ASC" 
              }
            ]
        }
    }
}
HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"

{
    "aggregations": {
        "tags": {
            "Anchovy": {
                "doc_count": 3
            },
            "Branzini": {
                "count": 80
            },
            "Cod": {
                "count": 60
            }
        }
    }
}

Total Sales by Month

POST /v1/resources/aggregate HTTP/1.1

{
    "aggregations": {
        "monthly_sales": {
            "type": "range",
            "field": "closed_date"
            "ranges": {
                "2019-01": { "from": "2019-01-01", "to": "2019-02-01" },
                "2019-02": { "from": "2019-02-01", "to": "2019-03-01" },
            },
            "aggregations": {
                "sales": {
                    "type": "sum",
                    "field: "price"
                }
            }
        }
    }
}
HTTP/1.1 200 OK
ETag: "dd2796ae-1a46-4be5-b446-7f8c7a0e8342"

{
    "aggregations": {
        "monthly_sales": {
            "count": 100,
            "closed_date": {
                "2019-01": {
                    "count": 100,
                    "sales": 10000
                },
                "2019-02": {
                    "count": 80,
                    "sales": 8000
                }
            }
        }
    }
}