The Architectural Principles Behind Vrbo’s GraphQL Implementation
At Vrbo, we’ve been using GraphQL for over a year. But there are some differences in how we’ve implemented and used GraphQL compared to some examples we’ve seen in the wild.
This is because in adopting GraphQL at scale, we wanted to ensure not only the success of individual teams, but the scalability and predictability of that success from team to team across Vrbo.
In addition, we’ve learned from past mistakes with other orchestration models and wanted our new model to work differently.
Previously, we relied on dedicated orchestration services. These proved to be not only an unnecessary moving part, but created dependencies between teams that slowed them down and created a wider blast radius when something went wrong.
High level goals
In developing a new front-tier API (orchestration, public APIs) architecture, we had a few goals in mind.
- Increase team velocity by reducing dependencies between teams and services
- Reduce blast radius by:
* Insulating teams from change
* Reducing points of failure - Improve predictability of a team’s:
* Compute needs
* Cost expectations - Simplify collaboration and sharing
- Improve portability across environments
Breaking up GraphQL schema into components
A GraphQL schema is made up of types (type schema), root types (API operations) and resolvers (business logic).
An example type:
This defines a new type Author, which is merely a type definition. To perform operations on Author, such as a query, we must define a root type:
This defines a query operation to the API’s surface called author that clients can now interact with.
But we still need to execute some code for the query to do work and this is where resolvers come in. In javascript, a resolver might look something like this:
As you can imagine, a schema can grow quite large as our API surface area grows. This is in part why we break down and componentize these elements of a schema into independently versionable modules that can be imported and merged into an API based on the requirements of the use case.
An application or API can aggregate as many components as it needs to fulfill its UI or user needs.
Our first iteration of where schema is merged, which has been the basis of our success so far, is called “partials”. This brought the convenience and simplicity of node modules to GraphQL.
Our next major iteration for GraphQL includes a newer component model in which each component is its own fully independently executable schema, making both composition and aggregation easier.
Implementation design principles
In developing tools to create component-based schemas, we want the following characteristics:
- Composable types
* Types can be made up of many shared types
* Extend and compose types for a use case without impacting others - Composable resolvers
* Invoke without invoking a new service (no network hops)
* Ability to reduce repetitive service calls within the same query tree - Portable
* Injectable upstream requirements (service clients, etc) - Easy to collaborate
* Shared code, not services
* Manage contributions, version as needed, simplify co-development - Schema first
* Easy to read, Easy to update
* Collaborate cross platform on API design in common language
Separate use cases, separate services
With a component model it is not necessary to provision, operationalize and share a service. Each application or API can provide for its own needs by specifying the components it needs.
Because components are composed and merged together without a centralized or shared service, teams can:
- Pick and choose their own orchestration needs
- Version independently of other teams
- Scale independently as their use case demands
- Perform better cost attribution
- Explicitly declare data requirements in UI components
This is a powerful model because it enables our goals for reducing dependencies between teams and the affected area caused by changes in a shared service.
Composition
One of the great aspects of GraphQL is its ability to compose types and resolvers. We don’t want to force two teams to constantly iterate on the same definitions just to satisfy all use cases.
Instead, it is easy for teams to compose new types or extend existing types as separate components that can then be versioned and developed independently.
A component module A may be composed of an imported module and type T while another component module B, with slightly different needs than A has been composed from a T’ component.
This also allows components to take advantage of existing resolvers by simply invoking across a binding to the composed type. We can also cache results from these calls within a single execution of a query or mutation, reducing the number of times something must be resolved at all.
Components in code
So what might this look like from a code perspective? Let’s take a more real-world example for a property listing appearing on a site like Vrbo.
What is a Listing made up of? Let’s make it overly simple and assume:
- Property
- Reviews
Let’s start with the Property component.
Next, Reviews component:
Finally let’s compose these together into a Listing:
This is different because it also has a new declaration to import property and review. This enables listing to take advantage of both the types in these two components, as well as the resolvers.
In this last example, let’s take a look at the listing resolvers as well:
This makes the listing resolver delegate its primary query to property and review in parallel to form its base. The cool thing about this is that it does not simply invoke the resolver function, but rather executes through GraphQL. This lets both type validation and type resolvers to continue to run normally.
The rest of the shaping for Listing type is done with type resolvers (not shown here).
Collaboration
GraphQL is both a query language and type-based schema-first API specification.
One of the problems (and sometimes benefits) of REST is that it is not naturally schema-first. Tools like the OpenAPI specification often rely on after-the-fact generation of specification for documentation purposes only; there is no strict binding of API schema to implementation.
With GraphQL, the schema is the API, and that is a powerful thing.
Because we have relied on inner-source collaboration to develop many of our GraphQL components, the importance of being able to do so in a more accessible way has been critical.
Plain test and even separate .graphql files for type and root type definitions are much easier to read and collaborate on than code, not to mention agnostic to any particular language or platform.
Keeping in sync
When you start versioning modules independently, keeping teams moving off of unsupported versions requires additional tooling to keep track of dependencies and notify developers.
We keep track by building dependency graphs from applications which we can query and run reports to see who is using what.
Finally
In this journey, our evolution has been from monoliths, to miniliths and BFFs (back-ends-for-front-ends), to node apps and modules. But the journey isn’t complete. The industry is evolving to serverless and static pages (JAMStack, etc) and we have begun to as well. As a result, part of our design has also been about runtime portability as well as environment portability.
When it comes down to what it takes to develop a modern web application at Vrbo, it looks something like this:
Developers spend their time in two areas: React development and GraphQL development (which is usually just reused). This begs the question: why are we deploying applications at all?
With the advent of new capabilities in CDNs like compute with CloudFlare Workers, Fly.io, and others, a serverless (and even containerless) orchestration layer makes a lot of sense.
We plan to experiment with pushing our model to the bleeding edge (pun intended), and it is made easier through a component model designed for flexibility and developer scale.
Further reading
While we were working on the next iteration of our partial schema / component model, a new open source project was released called GraphQL Modules. GraphQL follows an almost identical paradigm and looks great.
I’ve been working on a similar project in the open that is an iteration on our existing internal solution and that is what is used here for the examples. Currently, some simple examples can be seen here.
You can also read more about some of the history in these links: