This website uses cookies. By using the website you agree with our use of cookies. Know more

Technology

Geo-distributing content at scale

By Nuno Fernandes Monteiro
Software engineer with full-stack development competencies. Experienced in the fields of eCommerce, fraud and revenue ensurance. Nike fan.
View All Posts
Thiago Rebouças
Thiago Rebouças
Thiago Rebouças
Helping to design distributed systems with head above the clouds and Adidas Ultraboost on the feet.
View All Posts
Geo-distributing content at scale

Abstract

To deliver great experiences to our customers, we need to be supported by reliable information systems that provide the requested content consistently for everyone, everywhere, as fast as light. There are a lot of challenges regarding the most traditional approach to fulfilling this kind of requirement. This article will describe an overview of how FARFETCH distributes one of its platforms across different geographical zones, taking advantage of a cloud computing environment, to optimize the delivery of inspirational content to its customers.

Introduction

Whenever you land on a FARFETCH page you receive a carefully designed experience that resonates with your fashion identity. Whether it's inspirational editorial content, stylish product descriptions or amazing photos, multiple pieces of content are designed, published and assembled for your pleasure. 

FARFETCH defined content management and delivery as one of its business pillars and from there came the need to have a Content Platform. An API interface provides access to the platform and enables content management tools to store and publish content. The platform then distributes it in an omnichannel way, delivering it to any web, mobile or another appropriate channel. Through this platform, FARFETCH can leverage its content creation and distribution in a scalable and reliable way.



The scale of FARFETCH is both a blessing and a curse. We all want to reach the maximum number of customers, but of course, that increases complexity in all technical designs. The FARFETCH platform is built on top of multiple datacenters, and the content platform is no exception to that. High reliability and high performance are standards that we cannot abandon and that brought us this particular conundrum. How can we make a particular piece of content that was published by a tool in Europe reach our datacenter in China? How can we make this transition almost instant? Geo-distribution was a challenge and here's how we tackled it.

Designing the Content Platform

The Content Platform is divided into 2 separate logical boundaries, Management and Delivery. The Management boundary is where all back-office operations are done, just like a Content Management System for editing and publishing content. Delivery is where the final product is assembled and delivered to customers all over the world. Since both boundaries have different requirements and SLAs, it’s natural that both boundaries are independent and have their own technologies/databases. When a content editor publishes content, that entity is migrated from the management boundary to the delivery boundary. This migration process is what we will be described next.


Where it all starts - Management

Whenever content is persisted in the management layer, it is stored in a Cassandra database that guarantees automatic replication to all other data centers. So, when talking about management, replication isn’t an issue. An entity that is saved in a European datacenter will "instantly” be available in a China datacenter by the out of the box processes that the database offers. But looking at the architecture of the platform, it’s easy to see that this is not the same when it comes to delivery.

Getting the content to Delivery

The FARFETCH business requires lightning speed content delivery as well as versatile querying. You can target content at specific countries, languages, customer tiers, geo-locations…the possibilities are endless. Not only do we want to give our customers the best experience, but we also want to give our editors the best tools possible. Ideally, when an editor publishes something it should instantly reflect on the outgoing channels, meaning that we want a system with no cache.

In order to support delivery, the platform relies on Elasticsearch, a technology that provides a distributed full-text search engine with an HTTP web interface and works with schema-free JSON documents. Elasticsearch stores the data for lightning-fast search and fine-tuned relevancy, which perfectly fit our needs so far. Unfortunately, at the moment that the platform was designed, the latest version of Elasticsearch did not support an out of the box replication level that covered our scenario. We evaluated other similar options available on the market but had not found one that better met the aforementioned aspects. So, we went back to the blueprints to design a solution considering an in-house replication mechanism for the Read Model to fill that gap at that time.

In order to migrate content from the management database into the delivery elasticsearch repository, we built 2 specific pieces of software (Updater and Dataloader). Updater is a typical consumer application in a publisher/subscriber model that listens to events that are produced by the management boundary. Whenever content is published, the updater fetches the referenced content and does the necessary data transformations in order to store it in the delivery boundary. Dataloader is then responsible for the next step in the pipeline, as it injects the content in Elasticsearch. The message bus technology used for this purpose is Apache Kafka, a technology that is widely adopted within FARFETCH and that guarantees reliable message delivery.


As you can see, this architecture covers the base requirements of migrating data from Management to Delivery, but it falls short when we think about a geo-distributed environment. When content is published in a European datacenter it will be migrated to the Elasticsearch repository in that exact datacenter. But how do we notify the other datacenters that this content was published?

The Replication

The Content Platform exists in multiple distinct clusters which are geographically isolated - we need to use the data that was composed by our updater and make it available for every cluster.

To distribute this message we are writing it on a Kafka topic that has a different behaviour than a regular topic. But before we dig into this, take into consideration the introduction of a component called Replicator which is illustrated in the following diagram.
A high-level vision using a Kafka Replicator in a Multi-DataCenter Deployments Architecture

In general lines, the Replicator configuration requires the setting of a topic as a source from where the data stream will be copied, and a list of destination topics, a.k.a replicas, to where the data will be written into. In order to enable an active-active replication, as said, the destination topics must be set as sources as well.

With this in mind, we can return to the big picture. Our Updater message will be published on Data Center A on a topic that will be distributed by the replicator on both Data Centers B and C and the Data Loader is able to retrieve the ready-to-consume content message and store it in its own read model. Notice the Data Loader should be almost completely untied from other parts of the system. The described solution can easily be mixed up with the pipes and filters design pattern, where Kafka acts like a distributed milestones repository, holding the state of every step in the multiple-level processing. The following picture shows the complete flow of the described process.

A high-level vision of the complete flow

The Confluent Replicator

The Replicator in the picture above is a Confluent Replicator. It is a dedicated component that belongs to the Confluent Platform family and its core functionality is to add the capacity to reliably replicate the events and related topic configuration between multiple Kafka clusters spread across multiple data centers. It is strongly focused on use-cases such as active-active geo-localized architectures which allows the users to access a nearby data center in order to optimize the platform for low latency and high performance.

In the picture above, Data Center A contains all the component abstractions while some of them were hidden in B and C for reading simplicity. Usually, all Data Centers have the same configuration of A.

The real-world scenario is quite complex, from the scale of each component in the picture above to a couple of other elements involved, which have been hidden for simplicity, like caching, CDN, proxies, load balancers, other services, third-party integrations, etc., each one with its own importance and relevance to the quality of our delivery.

Key Learnings

Geo-distributing information across various regions can be a challenge, but when done properly can greatly benefit a platform. By building a system with this architecture we were able to learn some lessons:
  • Having geo-distributed content can greatly improve our delivery time by region and is another stepping stone towards high availability;
  • Apache Kafka is a viable tool to geo-distributed information across data centers and allows all workflows to be executed asynchronously;
  • Using Elasticsearch as a read model allows us to greatly increase the speed of our delivery;
  • A geo-distributed delivery greatly increases the platform complexity and cost, since we need an extra effort to keep our information updated across different data centers and we have greater infrastructure costs.
  • The added complexity has a toll when troubleshooting issues - invest in tools and processes to give you max observability.









Related Articles