GraphQL Initiative at Remind

When Remind first started, it was a simple app that allowed teachers to send announcements to their students. Like many start-ups, it made a lot of sense at the time to build the API using Ruby on Rails. Rails provided many tools to prototype quickly and ship features fast. As Remind grew larger and larger, the simple Ruby on Rails API became a gigantic monolith. At that time, we also started to notice performance issues. One of the initiatives to improve performance was implementing GraphQL.

We decided to implement a GraphQL server in Node.js using Apollo’s GraphQL server library. We chose Apollo because it provided a simple interface and useful features such as query batching. We have also adopted Apollo’s GraphQL client library for the React web app. It allows GraphQL to be incrementally adopted, so we can rewrite the client data stores as we convert the corresponding REST endpoints to GraphQL. before

Before: Clients make several requests to API endpoints. All these requests are resolved by API.

after

After: Leveraging the GraphQL server to resolve queries, the client only needs to be aware of the single endpoint for the server.

Why GraphQL?

Here are two big advantages GraphQL brings:

  1. Flexibility
  2. Efficiency

Flexibility

The GraphQL server resolves queries by sending requests to various services and combining their responses together for the client. It acts as the service router layer for our backend, so incoming requests no longer need to hit the API. Moreover, the query interface also abstracts the internal services from the client. Rather than hitting multiple REST endpoints for different information, the client can specify all the required information in one query to the GraphQL server. This allows us to make changes to our backend architecture without affecting the clients. If we create a new service to replace a part of the API, we can simply implement this change in the server and the old queries will still work.

Another advantage GraphQL brings is the flexibility for the client to specify the response. Since the response can vary for different clients, versioning is easier and the maintenance cost for old clients is reduced. Instead of creating a new endpoint, new fields can be added to the schema directly without breaking old clients. New clients can include the new fields in their queries, while old clients will still get the same response using the old query. In both cases, the same schema is used. Similarly, fields can be deprecated by simply removing them from the client query.

Moreover, the ability to specify only the data you need also helps reduce network traffic. One example would be our experiments service, which allows us to conduct continuous rollout and A/B testing. Clients would query a features object to determine whether a particular feature is turned on for the user. Previously, the features object contained hundreds of fields in the response. Here is an example of the features object, showing just a dozen out of over 400 feature flags.

"features": {
    "14478821": false,
    "015.nov.dont_show_app_opened_notifications_pre_ask": false,
    "2014.dec.get_app.sent_via_rmd_me": true,
    "2014.nov.teachers_nux.reimagined_flow_class_code_blackout_3boxes": false,
    "2015.dec.5.13.android_blue_tabs": true,
    "2015.nov.dont_show_app_opened_notifications_pre_as": false,
    "2015.nov.dont_show_app_opened_notifications_pre_ask": true,
    "2015.nov.mobile_join_pdf": false,
    "2015.nov.new_group_invite_modal": true,
    "2015.nov.new_manual_invite_ui": true,
    "2015.nov.use_bell_icon_for_push_notifications_pre_ask": true,
 }

When an experiment is finished or a feature is completely rolled out, the flag would become redundant since clients would stop checking it. However, it was still in the response for backwards-compatibility with older clients. With GraphQL, clients only query the necessary flags, which shrinks the size of the payload and reduces our network traffic. For example, we can specify the feature flags we are interested in by writing the following query:

query {
  me {
    new_group_invite_modal: has_feature(name: "2015.nov.new_group_invite_modal")
    new_manual_invite_ui: has_feature(name: "2015.nov.new_manual_invite_ui")
  }
}

and the response would only have two flags:

{
  "data": {
    "me": {
      "new_group_invite_modal": true,
      "new_manual_invite_ui": true
    }
  }
}

Efficiency

When we took a closer look at the way our REST endpoints were structured, one thing we noticed was that we were already traversing a graph. For example, we have a /user endpoint to get the user information. Then we have a /classes endpoints to get the class information for the user. If you want to know more about the members of the class, you can get them through /classes/:id/members. With GraphQL, traversing the graph becomes more efficient as we can now get all the information using only one query. The following query is an example of the queries you can make, that fetches information on the user, all the classes for the user, and all the members of the classes.

query {
  me {
    uuid
    email
    first_name
    last_name
    classes {
      uuid
      class_name
      code
      members {
        id
        user {
          uuid
          first_name
          last_name
        }
      }
    }
  }
}

One query to the server can be resolved into multiple parallel requests to different services, which are stitched back together. An equivalent Rails endpoint could be used to retrieve all of this information as well, but since each Rails worker is single threaded, the response would be built serially and take much longer than the parallel GraphQL query.

To further reduce number of requests made, we took advantage of the out-of-the-box query batching support that comes with Apollo GraphQL client. Queries made within a short time period would be batched and sent to the server; this reduced network overhead, and prevented blocking queries due to the browser’s concurrent request limit. For example, we fetch a delivery summary for every announcement sent, which provides teachers with information on users who has received and read the announcement. Without batching, these requests were often subject to throttling due to the sheer number of announcements. Because we knew Apollo would automatically batch the requests, we did not have to spend any time manually implementing it and managing the data flow. Here is a sample screenshot of the network requests with query batching turned off. Notice many queries are throttled since Chrome limits the number of concurrent requests to 6.

network example

It took almost 2 seconds to fetch all the delivery summaries. By leveraging query batching, the GraphQL client bundled all of these queries in one network request while the server resolved them concurrently. As a result, the response time was reduced by 70% to 600ms.

Conclusion

The GraphQL initiative is a large scale, long-term effort across multiple engineering teams. All of our client networking layers must be rewritten for GraphQL while supporting legacy REST endpoints. In addition, a new routing server must be built for our backend services. In order to make Remind better, we believe GraphQL’s benefits will outweigh its costs.

We are actively working to deproxy some of our core infrastructure from the monolithic API. At the same time, we are also investing in building the GraphQL infrastructure, such as Autograph and rest-graphql. It will be exciting to see all these efforts come together and how it will help us improve the user experience.