Did I mention we’ll also slay the dragon of real-time search while keeping Drupal ultra-secure from invading orcs?
Let the adventure begin!
One day, our heroine Alice was happily using Drupal as a backend API feeding via JSON API a Gatsby builder. Pretty straightforward – Drupal was her source of truth for her national bank. But Alice supports a lot of internal clients – so she built all her static sites on a nightly basis. All during the day, client editors like Bob and Eve add changes to the content. When Gatsby runs the build, it fetches all the data it needs to build the static sites from Drupal using JSON. Right after that, Gatsby deployed each static site to the cloud. So far, so good… !
Alice is happy. The sites are extremely fast and performant. But Bob and Eve are unhappy. Because they add changes, promotions and edits to content all day long… but none of their changes appear to the next day… because that’s when the Gatsby build happens. Customers of the bank see nothing from Bob or Eve until the next Gatsby build overnight. Alice solves this allowing triggering builds on demand so Bob and Eve are happy again. Nevertheless, A scary dragon is lurking in the darkness to attack because it doesn’t matter what kind of build, cron job or on demand work happens, Alice knows there will always will be a window of time during which the published content Drupal has will be different than what’s published on the static site, and that can end up causing problems.
Alice tells her people – Bob and Eve – not to worry. But then a new function raises its head: Search. Yes, they want to add search. Due to the nature of Alice’s static sites, she will need to provide a search engine retrieving information to the static sites. But here is the problem: what if she decide to query the Drupal instance for that information? If so, Drupal will face a nice challenge. It needs to exclude all of the information (new information, or even new versions of the content) in favor of retrieving the exact same information it delivered to Gatsby during the last build. Why is that? Because if we deliver information about a new promotion that was not part of the latest build, and Gatsby rehydration is using that to render a clickable card redirecting the user to the new promo, it will produce a 404! Not so good for Alice or for her clients, right?
Solving this from the Drupal perspective requires a lot of effort and in sync processes. For example, Alice could expose a search API only including content fetched last time, avoiding published content but not fetched content. At the same time, for editions Alice could force the editor to create new versions of the content and in case she receives a query for that content, she could retrieve the version she provided answered last time via JSON API to the Gatsby consumer. Although all of this is doable, it is far too complicated and at the same time will give Bob and Eve’s customers potentially inaccurate results. What if the Gatsby build ends well, but publishing to the cloud fails? Drupal will consider that the latest versions of the content are already available on the web while that is not the case.
What a conundrum!
And yet since Alice worked with our team at Darwoft, she has a clever way to solve this problem. The key we found to solve the problem, is to stop thinking the Drupal instance as the source of true for any particular version of the static site at any time. Of course, Drupal will continue being the source of truth for many areas of business data and it will be the source of truth of each Gatsby build at the moment it happens, but nothing other than that. Please note the use of a source of truth associated with an instant of time on the previous sentence that went almost unnoticed.
Alice realized that we need to start considering different sources of truth, one per each build. Each time Gatsby fetches the information to build the site, it creates a “mirror” of the Drupal instance and we need to associate that mirror instance to the build for that moment on. This is what we at Darwoft call the “build’s source of truth”.
Once we get this concept in our mind, the sky is the limit!
We are using the “build’s source of truth” to feed third party systems such as Apollo GraphQL servers and elastic search servers that will give us the required functionality to provide our static sites with real-time content to improve the user experience and to support search functionality.
This approach solves Alice’s problem with some clever advantages:
- Alice can now use the Drupal instance under a VPN which drastically reduces the possibility of hacking. Only editors having VPN certificates and the agents running gatsby build processes need access to Drupal itself.
- Alice doesn’t need to worry anymore about versioning content on Drupal. She doesn’t even have to be worry about new published content. Drupal is not in charge of indexing content and/or retrieving information to users in real time. Drupal responsibility ends up at the moment it retrieves all of the information Gatsby asks for at some point in the past.
- Take your time and breathe deeply for the next one: with this approach, Drupal could be used by more than one Gatsby process (or any other framework). This is a trick that was simply not possible if Drupal needs to keep track of what content retrieves to each consumer.
What happened in the end? Bob and Eve were happy. Alice got a promotion. And the bank that she works for conquered their entire industry and saved the world from bankruptcy and doom! Contact us for more great stories about plucky technical heroines and clever tricks.
At Darwoft, we tell stories about Drupal because we are the heroes who slay your dragons, fight for your Alices, and save your projects!
To find out more, contact our Managing Partner for the Pacific Northwest, Ned Hayes (ned.hayes@darwoft.io / 206-321-7981 - Seattle, Portland and beyond)