Today, we are excited to announce that the MELLODDY project (Machine Learning Ledger Orchestration for Drug Discovery) has met its year one objective - the deployment of the world’s first secure platform for multi-task federated learning for drug discovery. After a very intense year of collaborative effort, the platform is fully functional, audited, and tested at scale and has successfully trained a unique predictive model across ten pharma companies. What makes us particularly proud is the fact that our Kubermatic Kubernetes Platform was used to build the scalable Kubernetes infrastructure for each pharmaceutical partner.
The MELLODDY Project is an Innovative Medicines Initiative-funded consortium of 10 pharmaceutical partners including Bayer, Boehringer Ingelheim, GSK, Servier, and Novartis and seven technical partners including i.a. KU Leuven, Owkin, Substra Foundation and Kubermatic that has the potential to solve the challenge to cooperatively training machine learning models while protecting sensitive data to accelerate drug discovery. With today’s announcement the partners have come a good step closer to this objective.
Project MELLODDY - A New Way of “Coopetition”
Federated Learning (FL) is a Machine Learning (ML) technique that enables researchers to train artificial intelligence (AI) models on distributed data, at scale, across multiple institutions - without centralizing the data. The MELLODDY project is creating a solution that will facilitate a new form of ‘coopetition’ where competitors have a mutual interest in building predictive models that benefit from a parallel effort - while still protecting their private research, data, information, and models.
MELLODDY’s successful deployment of the platform is the world’s first FL experiment in drug discovery performed at this scale and between competitive industrial partners. With this milestone achievement, 10 pharmaceutical companies, which otherwise are in competition with one another, have simultaneously trained their predictive models to learn from all the data submitted by each pharmaceutical partner.
The aim of the MELLODDY consortium is to develop a cutting-edge FL platform that enables the generation and enhancement of predictive ML models, using distributed pharmaceutical data and without exposing or revealing any of the individual company’s proprietary data and models. This type of collaborative, yet still protected, data collection process has the potential to solve the challenges of data sharing within pharmaceutical research while also significantly advancing drug discovery and development opportunities.
Here Is How the Collaborative Model Works
Partners securely register their proprietary datasets in their own local instance of the distributed platform, which allows the private models to learn from the aggregated knowledge of all partners, without sharing private data.
The development of the MELLODDY platform and the execution of the first federated run was a key milestone and a technical triumph which was achieved by the MELLODDY consortium as a first year objective of this three-year project. The design, implementation and operation of the secure platform for multi-task FL was built on the previous work and expertise of the seven technical partners. The research partners (BME, Iktos, and NVIDIA) focused on implementing ML for drug discovery, ensuring privacy, and optimizing training speed on NVIDIA GPUs while the operational partners (Owkin, Kubermatic, KU Leuven, and Substra Foundation) developed and provided the code for the platform. Owkin provided Owkin Connect, its novel privacy-preserving framework to enable multitask FL. While KU Leuven provided SparseChem, an open-source library for training ML models specific to drug discovery, we deployed our Kubermatic Kubernetes Platform to build the scalable infrastructure for each pharmaceutical partner. Finally, Substra Foundation managed the technical operations, monitored the executions of the platform, and hosted the open source code which is part of Owkin Connect.
The platform passed extensive and rigorous security audits by an external company and by the IT teams of each pharmaceutical partner to ensure data privacy and protection which was an absolute prerequisite for the deployment and the project’s major challenge in the first year.
Meanwhile, the pharmaceutical partners have already begun an extensive scientific and business case assessment of the results of the first cycle of modelling runs; the outcome of which, de-identified and aggregated across all partners, is considered for publication. Over the next two years, the MELLODDY project will focus on improving the performance of the common predictive model by exposing it to an increasing amount of data.