This website uses cookies. By using the website you agree with our use of cookies. Know more


Establishing a Knowledge Network with Academic Institutions in Machine Learning Research

Establishing a Knowledge Network with Academic Institutions in Machine Learning Research

A case study of co-organization in the VISUM Summer School


In 2021 and 2022, our team partnered with the VISion Understanding and Machine intelligence (VISUM) summer school to organize a project, define a small contest, and provide mentoring, respectively. Why do we want to be part of it?

We want to influence and shape the landscape of R&D avenues to meet the talent needs of high-tech AI and data-driven businesses. We want to participate actively in exchanging knowledge and fostering and building knowledge networks to meet longer-term company ambitions and mentor and advise early-career researchers and developers.

Why VISUM? This summer school offers an exciting learning opportunity for students, young researchers, and professionals interested in Computer Vision and Machine Learning. With ten years of experience, this academic-industry collaboration makes various significant contributions to the transmission of research and technology and the economic and social advancement of knowledge (see Reference [3] below).


In 2021 we participated in their five-day project-based learning component, starting with a pitch presentation about the project. In 2022 we also proposed a quick challenge about "multimodal product retrieval: attention to fashion details," led by Pedro M. Ferreira (illustrated in Figure 1, on the left).

In 2022, we also supported the mentorship programme (Figure 1 on the right with Ricardo Sousa, Karolina Romanowska, and Pedro M. Costa). This programme drew inspiration from the work presented at the previous edition of VISUM. To give a bit of context, we were focused on iFetch, a conversational commerce platform governed by at least 10 microservices, with half being ML-oriented. We needed to identify mechanisms that:

  1. guarantee solution quality on each development cycle;

  2. enable automatic assessment of end-to-end automated system tests; and,

  3. a lean architecture principle for ML CI/CD (see Figure 2).


Figure 1: Pedro Miguel Ferreira on the left presents the pitch, and Ricardo Sousa and Pedro M. Costa, mentors of Karolina Romanowska, on the right at VISUM 2022.

These challenges allowed students to apply their knowledge to a real-world fashion problem while exploring new concepts. For example, take our project-based learning challenge, "complementary outfit retrieval” (VISUM 2021).

This project, part of a broader ambition to enable a conversational AI agent (iFetch) to answer opinion questions such as "what goes well with this” or "is it suited for an occasion,” allows students to grasp the objective and set of challenges under the project scope. Learn more about iFetch from our most recent tech blog.

Participants could learn from, exchange with, and compete with one another to create complimentary fashion outfits for product retrieval in a friendly and colorful scientific setting. 

Figure 2: Pedro M. Costa and Pedro Ferreira clarify the VISUM project to participants, showcasing iFetch, a Multimodal Conversational AI Agent.

Our team provided a baseline and an original negative sampling process for triplet mining to accelerate participants' onboarding to the challenge. To induce the model to learn a complementary embedding space, the implemented baseline model comprises three main modules or sub-networks (see Figure 3 below): an image encoder, a text encoder, and a multimodal encoder (see our blog post to learn more).

Figure 3: Baseline Model Architecture.


During the competition, which lasted for five days, a total of 52 people participated, forming 18 teams. Of those teams, sixteen turned in at least one valid submission and took part in two brainstorming sessions with specialists in computer vision. These specialists included people who work in automatic complementary product retrieval, including ourselves.

In terms of strategy, every group that successfully outperformed the accuracy of the baseline did so by basing at least some of their solution on the baseline mentioned above. In most cases, groups relied on deep multimodal product encoders, which generated embeddings that placed items that were complementary to one another close together and items that were not complementary to one another far apart.

The teams experimented with encoder architecture, modality fusion, ensembling, different loss functions, and training conditions. This helped them improve their results compared to our baseline. Notably, the works of References [1] and [2] (see below) played a more significant role in the literature since they were discussed during the sessions in which ideas were generated.

The significant improvements over the baseline energized the teams in the final hours to push even harder. The presentation at the end of the challenge and the subsequent debate created a lively environment for knowledge and contact sharing for future references. One of those members is now a Ph.D. student under the CMU Portugal framework for iFetch.


FARFETCH believes that partnerships and engagement with academic institutions contribute to advancing our industry knowledge in applying AI and ML to high-end fashion. It's a win-win scenario: FARFETCH benefits from access to collaboration with academic researchers, and educational institutions benefit from increased direct exposure to real-world problems made possible by the vast amount of data available. Utilizing the knowledge in each of these areas was essential to developing an exciting issue that the community could face in the real world.

In general, such partnerships represent the diversification of the subject matter addressed in scientific challenges and more prosperous venues for the participants to engage with one another and learn from one another. Last but not least, initiatives like this one help bridge the gap between basic and applied research by fostering the development of new communication and knowledge networks that make it simpler to disseminate research findings. The results of this joint work will be published in the journal Machine Vision and Applications [3].


[1] Vasileva, M.I., Plummer, B.A., Dusad, K., Rajpal, S., Kumar, R., Forsyth, D.: Learning type-aware embeddings for fashion compatibility. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision –  ECCV 2018, pp. 405–421. Springer, Cham (2018)

[2] Lin, Y.-L., Tran, S., Davis, L.S.: Fashion outfit complementary item retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3311–3319 (2020)

[3] Castro, Eduardo et al. (2022). "Fill in the Blank for Fashion Complementary Outfit Product Retrieval: VISUM Summer School Competition”. In: Machine Vision and Applications. Accepted

Related Articles