This website uses cookies. By using the website you agree with our use of cookies. Know more

Technology

Making Order Out Of Chaos In a Monorepo

By Francisco Tavares
Francisco Tavares
Plugged in 2018 at FARFETCH for delivering fast and reliable software in New Balance low-top sneakers.
View All Posts
Paulo Barbosa
Paulo Barbosa
At FARFETCH since 2018, making the impossible become possible wearing Timberland boots.
View All Posts
Making Order Out Of Chaos In a Monorepo
The front-end is a key part of the Farfetch.com customer experience. As we continue to deliver the best user experience, we must also contend with evolving business needs and occasional structural changes to improve code quality and manageability. Today we’re bringing you the story of our stormy but successful transition to a monorepo, where we migrated the source code of all of our front-end applications to a single repository. We will focus on its implications on our testing architecture through the lens of a developer.

Who We Are: Eyewitnesses of a Monorepo Birth

We are software engineers dedicated to delivering the best services and experiences to our customers of luxury fashion. As such, we were on the front line for all of the setbacks and successes for our large-scale transition to a monorepo architecture.

For a few years, FARFETCH operated each front-end application separately which enabled a lot of team autonomy. But it also brought problems, such as duplicated code, inconsistent user experience and performance issues between pages.

To overcome these problems, we began gradually migrating each application into a monorepo. This moved a lot of our code, and basically every single visual module you see, into the same code repository.

Enforcing Quality on Monorepo Contributions

A monorepo containing that much code would have to deal with a great number of frequent contributions. From the start, we built in a rigorous continuous integration pipeline to ensure the quality of the code. 

We required every contribution to pass various quality criteria, ranging from basic unit tests to complex functional and visual tests. For instance, if a developer made a specific feature for our product details page (or PDP), we required it to pass every PDP test. But for other broader features encompassing multiple pages, we'd verify the tests of all the respective pages. 

Overall, we ensured quality through four main layers of testing: unit, functional, visual, and staging. Image 1 depicts how we organized these layers in a testing pyramid.


Image 1 - Scheme representative of the testing pyramid

The unit tests focus on white-box testing, asserting the correct behaviour of every individual component. On the other hand, the functional, visual, and staging tests use a black-box testing approach to check product requirements.

We've built our end-to-end tests -- i.e., the functional and visual -- on the same infrastructure, with each differing only by their expectations. While the functional tests check the UI behaviour, such as checking if sliding product images work, the visual tests check the layout of the page. 

Finally, the staging tests are the last testing layer composed of a limited number of tests focused on the Farfetch.com purchase flow from the content page to the checkout. 

Image 2 - Staging Test interactions

Houston, We Have a Problem: An Explosion of Tests

Over time, the monorepo had grown considerably with more than 30 applications in it which meant much more time spent in our build pipelines and test execution. A year ago, opening a pull-request meant waiting at least half-an-hour, mostly due to building the assets. But we had another major problem: flaky tests. Not only did our pipelines take so long, but the flaky tests frequently made us retry the pipeline all over again. This problem started in some builds until it had infected the monorepo itself. We had reached a point where the pipeline took two hours to run all the continuous integration (CI) steps.

When we talk about fast-growing projects like this monorepo, a constant increase of the number of tests will mean a high probability of failure. If we think that each pull-request on average fails three times in a row, it will triple the number of "necessary” builds, which will significantly affect the performance of the infrastructure. 

Consider this example: in a pipeline with a flakiness rate of 0.5% for each test, you only need 150 tests to have more failing pipelines than successful ones. This number of tests might seem ludicrous for one application, but for a monorepo it’s quite normal.


Image 3 - Probability of a pipeline failing by number of tests

Feedback is key to a developer's productivity. The better and faster a developer gets feedback from the pipelines, the sooner they are able to fix any issue of any sort.

Taking Action With Observability

Once we realized something was wrong, it was time to deep dive into the problem. Initially, we knew that some builds had been failing, but why and where was still blurry because some builds failed as expected while others didn't.

As such, we opened a merge request with impactless code changes that would make the pipeline run all the monorepo tests. This way we knew any build failure would be a false positive because our changes were harmless.

We would update the MR hourly for a week, manually monitoring and analyzing each failed build to give us insight into what was causing all the trouble. That meant each day we had to go through dozens of builds with thousands of log lines, assessing if the failure was a false positive. If so, we would investigate why it failed.

Image 4 - Percentage of Success Of Visual Tests

By analyzing each failed build we observed some patterns of failure, namely certain visual and functional tests that had a considerable tendency to fail. We made other surprising discoveries, such as the pipeline running more tests than necessary. Even worse, we found unrelated scopes.


Image 5 - Frequency of Failure By Testing Scenario

Needless to say, this analysis took a long, long time. But it paid off. We gathered enough information to fix most of our issues. But more importantly, our analysis showed that our monorepo pipelines had to be monitored daily. Observability became a key property for a non-production system.

After all, we badly needed an automated monitoring system that would help us find and prevent the issues we were having. Manually monitoring and assessment are unsustainable. Our lesson? It's worth building in automated monitoring from the beginning. As time passes and the monorepo grows, it becomes more and more impossible to do it and you will be swamped by scale.

With a monitoring system that allows you to see all executed builds --and to filter their executed tests by failure, testing layer, testing id, and duration --you can quickly identify the bottlenecks of all the monorepo developers' productivity in minutes. We can then spend that time saved on what's more essential: code quality.

Mitigating with Automated Test Retries

One of the main sources of flakiness came from the visual and functional testing layer. These automated tests are very tricky, they failed for various reasons and were not easy to get right. Even if they initially work well, it's not certain they would continue to as the application changes.

Confronted with flaky automated tests, we discussed retrying them two more times to see if they failed. If they failed all three times, we presumed the test failed due to the changes introduced. Keep in mind that while retrying failed automated tests takes no effort, it doesn’t really fix the problem. In fact, it might hide a problem or poone. That’s why this is a very debatable topic: while some preach it, others hate it. 

In our case, we found that auto retries on tests brought significant improvements to execution time. We noted that we executed fewer builds. This means tests that initially failed had succeeded on future tries. Although this slightly increased the overall execution time, it was much smaller than the time needed to run the entire pipeline from the beginning.

The Paradox of Group Behaviour

So the more people working on the same project rather than multiple, separate projects, the better … right? Wrong! In fact, a monorepo -- or any other project with many contributors -- is a perfect breeding ground for accumulating technical debt and creating a tragedy of the commons.

Teams are frequently changing their scope and people are taking on new positions or moving on to other projects. Who will be the maintainers of the code? And how do we assign scope?

Psychology describes this phenomenon as a diffusion of responsibility. As a crowd grows in size, each individual unconsciously feels less responsible. People begin to "go with the flow" even in emergency situations where action is needed.

Our CI pipeline had no ownership whatsoever as the team that built it was no longer at the company. Initially, it worked perfectly but as time passed and more applications were added, it began to decay. Like the bystander effect, every single contributor to the monorepo watched it decay, but no one took the step toward fixing it.

Our solution? More than merely assigning the responsibility, we assigned authority to prevent all this trouble from happening, giving a team the confidence to take the lead on improvements and see them through, and nagging issues begin to appear in your "DONE" column. And leave no blurred boundary behind - ownership must be based less on blame and more on trust.

Distinguishing Between Should-not-fail and Must-not-fail Tests

Automated tests excel at safely ensuring each new feature keeps working as expected but adding them without planning or guidelines can easily accumulate potential troubles.

The key is prioritizing automated tests by importance. Not every test brings the same value, but all of them contribute costs to our workflow. One clear distinction is between should-not-fail and must-not-fail tests. For instance, the buy-flow scenario is clearly a must-not-fail test, since its impact is directly correlated to our revenues. Thus must-not-fail tests that fail should block our pipelines, meaning the developer cannot deploy the change if it breaks one of these tests.

On the other hand, the should-not-fail tests should not necessarily be placed in the pipelines. Instead they can be executed against the production environment as they are not critical enough to block the workflow. 

Through production testing, we continuously look for any potential problem. Every hour, or after every release, we execute a batch of tests that check the behaviour of the webpage in our client’s most used devices and browsers. If one of them fails, the appropriate team is notified. However, their pipelines continue to work without blocking.

This might seem like a reactive approach, but we find it ideal for tests with low impact but that are worthy enough to keep tracking. Although we don't place all automated tests in the pipeline, we don't want to sacrifice additional tests that still provide value. The critical part is where we place them.

Making Tests Our Friends and Not Enemies

As developers, it's easy for us to fall in the trap of segregating production code from testing code. Production code directly impacts our customers therefore it easily steals our attention away from maintaining testing code. When was the last time a colleague asked you to refactor the testing code?

This steers us towards making tests as invisible and lightweight as possible. Any friction they cause is magnified in a continuous development loop, but this creates a problem. Our tests no longer assist with the biggest task we have: maintaining our production code.

Like a mirror, our testing code reflects our production code. Usually, we view automation tests only as statements of correctness i.e., if the code behaves as expected, then the test passes. But this overlooks other benefits of investing in our testing code.

For example, these tests can serve as documentation, empowering developers to more easily understand each feature's behaviour. Instead of diving straight into the codebase, an unfamiliar developer can start by examining the tests. This value isn't limited to developers either, as it can aid product managers, designers, and support teams.

Large-scale applications make it harder to track what's going on. As we integrate more and more applications, components, and modules, it becomes harder to stay on top of things. Test evolution can be a rich source of continuous code comprehension.

Conclusion

As witnesses of this monorepo birth, we faced challenges that directly affected our productivity and raised doubts about the effectiveness of our quality process. Our knowledge-sharing culture (we call it #todos-juntos, or "all together") enabled us to take actions and realize benefits from these major changes. And in our spirit of knowledge sharing, we'd like to offer the following suggestions from our experiences migrating to a monorepo:

  • Enable quick feedback of your pipeline status and define an observability plan to automatically monitor your issues and take control of them as soon as possible;
  • Implement retries can alleviate short-term chaos, but they won’t fix your long-term problems;
  • Ownership without authority will lead to diffusion of responsibility and technical debt;
  • Prioritize tests between should-not-fail and must-not-fail will reduce the probability of failing pipelines and focus the team on what is really impacting the business; and
  • Build a mindset that testing code is as important as production code, as it can also serve as support or documentation with periodical revisions.








Related Articles
Being a Trainee in a Fashion-tech Company
Culture

Being a Trainee in a Fashion-tech Company

By Teresa Mouro Pinto
Teresa Mouro Pinto
Fascinated by the crossover of fashion and tech. Pre-owned enthusiast with a soft spot for Fendi.
View All Posts
Yara Neves
Yara Neves
Full-time fashion lover with +100 Korean series recommendations! The security tag on my blazer? It's Ambush, don't worry LOL.
View All Posts
View