Reasoning about GraphQL servers

By: on January 15, 2019

Lots of developers seem very excited about GraphQL, so it was useful to read Sacha Greif’s Five Common Problems in GraphQL Apps (And How to Fix Them). While learning GraphQL after working with gRPC recently a few more questions come to mind, discussed below. Being new to GraphQL I’m not trying to offer conclusions or recommendations at this point.

How would I handle scale testing of a system with a GraphQL server?

Five Common Problems in GraphQL Apps (And How to Fix Them) gives an example of a simplistic GraphQL server doing 71 database queries to answer one GraphQL query and notes that making arrangements for batching and caching can be important. There’s a deeper concern though. The expressiveness of GraphQL and the way that the queries comes from the client means that switching to GraphQL means we can’t predict looking at a server implementation alone how data will be accessed. A new query we hadn’t thought of before could end up doing table scans. In contrast, a less expressive REST or gRPC API design where all database queries are baked into the server means that we have a chance of ensuring those queries are appropriately optimised.

Adopting GraphQL therefore increases potential for a system to have been working well in production for months and then someone to come along with a new query that thrashes the database and causes performance crosstalk. The new query needn’t be malicious; it is very common when developing to be caught out by a freshly crafted query being thousands of times more expensive than expected due to an innocuous detail.

I have no idea how to mitigate the impact of expensive GraphQL queries. A traditional approach of measuring or limiting query rate might work fine for a set of optimised queries with similar cost, but even a single query that causes a big load spike is potentially an operational problem. It also seems doubtful that limiting the query response size is likely to be sufficient since small responses could still be expensive to generate.

Even if we arrange only to have client code we control calling the server, we often have little control of the lifecycle of client code.  Consider what happens if we push out an application update that makes GraphQL queries which thrash our production database in a way we can’t easily fix. Or, we could make a server update that works fine with the latest clients but suffers with some old query we’d forgotten about with old clients.

If clients use GraphQL to only request what they need it is quite possible that systems that use GraphQL in some cases will scale much better than APIs which need many more calls some of which return data the client doesn’t need.

How much logic do I want in the client?

GraphQL gives you the option to simplify your server code sitting above the database. This might mean we can spend less time developing repetitive read operation code on the server side. In some cases it may even be a good choice to expose the database directly to clients, and which point you have very little server code at all.  Moving code to the client does potentially introduce complexity:

  1. Often we trust the client less than the server, and therefore we still need to enforce policy on the server.
  2. Testing and debugging on the server alone can be much easier than a distributed client/server system.
  3. If we have long running or scheduled tasks we typically want them to happen whether or not a client is online. I still want my bank to pay me interest on a cash deposit even if I don’t log in to a rainy day account for years.

However, these days we often expect applications to work to an extent even when the server is offline or the network is unreliable, particularly now much of our effort goes on mobile apps. That means it is no longer always good enough to say that business logic should be exclusively in the server and the client should focus only on presentation.  So, we increasingly end up doing more on the client, so in such cases the concern is moot and GraphQL is beneficial.

How often will we find security vulnerabilities in the GraphQL query parser?

One of Steve Gibson’s mantras on Security Now is that interpreters are a huge source of security problems. Chen et al provide examples in Security bugs in embedded interpreters. With GraphQL, the client is pushing a potentially complicated query to the server which has to interpret it. The security risk exists any time software has to interpret external data, but the risk is much harder to deal with in more complicated cases; GraphQL has a ~30,000 word specification.


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>