GraphQL is really TreeQL and that’s OK

By: on January 2, 2018

Let’s have a look at GraphQL. It came out of Facebook as a replacement for REST style requests for querying data. It was initially developed from 2012 and made open source in 2015. As Facebook’s main database is the “social graph” it was naturally named GraphQL but, as we’ll see, that’s not a completely accurate name (though that doesn’t really matter in practice).

In GraphQL object types are described in a type schema and queries are specified as templates referring to those objects.

query BooksWritten {
  author {
    name
    books {
      title
      genre
    }
  }
}

Note these templates are document templates and not graph templates. Document-like objects such as these are effectively trees; there’s only a parent-child relationship between items (in fact, a container-contained relationship) and no natural way to reference an item in another part of the document. Databases (“in-memory” or SQL) often represent full, possibly cyclic, object graphs – but GraphQL can only express tree-like queries (SQL only expresses table-like queries even though the joins may be mapping over a cyclic graph, and you’d never expose raw SQL as an API).

A full graph query language would be able to specify cycles directly by naming nodes and referring to them elsewhere in the query. For example, in fraud detection it’s often useful to find loops in a graph to show where a group of people are suspiciously acting in common and graph query languages like cypher do this very well, even if the size of the loop is not known in advance. Another example relating to unknown path size is the “7 degrees” question which, in Cypher, is a one-liner:

MATCH p=shortestPath(
  (bacon:Person {name:"Kevin Bacon"})-[*1..7]-(meg:Person {name:"Meg Ryan"})
)
RETURN p

In GraphQL you’d have to construct a “recursive tree” query, with all the problems of combinatorial explosion, and search through all the intermediate nodes and leaves.

{
 actor(name:”Kevin Bacon) {
   movie {
       actor {
         movie {
           actor {
             movie {
               actor {
                 movie {
                   actor {
                    movie {
                     actor {
                      movie {
                       actor {
                        movie {
                         actor
}}}}}}}}}}}}}}

A query like that would probably return the entire database.

On the other hand this may be a benefit to some degree – unconstrained full-graph queries can clearly bog down the database if not developed carefully, so it would be a mistake to expose an API which allowed that. By restricting queries to trees the consumer has to think carefully about the queries they write as combinatorial queries become obvious, and most GraphQL implementations also support restrictions on query execution time in practice.

Once you’ve decided what your set of root query objects are going to be, and the schema that relates them to other objects, then you don’t need any other endpoints to fetch data related to any of those roots (or data related to data related to that root, and so on…). For a social graph a convenient root is the ‘user’, for a bookshop root queries may include ‘book’ and ‘author’ etc. The schema can be evolved by adding more ‘root’ queries. In this way a GraphQL endpoint can usually be extended over time by adding new items in a self-describing way without needing versioning or new endpoints to be created.

Evolving a GraphQL endpoint to remove items is slightly more tricky. Object types, fields, queries etc. can all be marked as “deprecated” such that IDEs and other tools subtly hide them from the developer and encourage their disuse and refactoring in existing client code. But they still need to exist for as long as there are clients that rely on them. Over time those clients diminish, as security and functional patches are adopted, until those items can be finally removed from the server. It may occasionally, but rarely, be necessary to rudely remove support for “long tail” clients – but this may be no bad thing as it merely forces the user to upgrade (and pick up security patches).

GraphQL can also be used to update data via the same endpoint, but rather than using limited http verbs like POST, PUT or DELETE the schema specifies the set of available mutations. These are more expressive than REST endpoint updates: they look more like function calls and attributes can be complex objects that fit the schema.

Evolving mutations can also reasonably simple: adding and removing mutations are dealt with in the same way as queries, and parameters are objects from the schema so are self-describing and can be evolved in the usual way.

So GraphQL is an extremely useful framework with several major benefits:

  1. removing the “N+1 requests” problem of REST where you have to do a request to get an object, and then make further requests to enumerate various linked objects. All data requested in the query template is returned in one round trip.
  2. REST has many other problems that GraphQL schemas solve
  3. Schema evolution which avoids the need for versioned APIs.

It’s just that the query language itself should really be called TreeQL.

Share

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*