Researchers at the Centre for Genomic Regulation (CRG) have begun efforts to build ATHENA, a generative artificial intelligence which can design proteins with custom properties. The project, led by Dr. Noelia Ferruz, has been announced today with the backing of a 1.5 million euro Starting Grant from the European Research Council.
Proteins have widespread scientific, medical and industrial applications. This includes enzymes which shorten chemical reaction rates from years to milliseconds, antibodies which recognise and neutralise pathogens, or therapeutic proteins which target and treat disease. These proteins are the result of many millions of years of evolution.
ATHENA will help design new proteins which do not currently exist, with properties that can go beyond those found in nature. For instance, it can be used to create enzymes which sequester carbon dioxide from the atmosphere. Another designer protein can bind to BPA molecules, helping detect and remove the harmful pollutants from the environment.
“Though nature's toolkit is vast and astounding, it doesn't always provide the precise solutions we need. Proteins to tackle pressing challenges like climate change or environmental pollution either remain undiscovered or simply do not exist. We want to build tools that can make these proteins a reality, providing completely new ways of tackling long elusive problems,” explains Dr. Noelia Ferruz, Group Leader at the Centre for Genomic Regulation and coordinator of the ATHENA project.
ATHENA is a generative artificial intelligence tool. The most famous example of this fast-growing, disruptive technology is ChatGPT, which can process and generate human language in written form. Large language learning models like ChatGPT are trained using text-based datasets and have the ability to learn, improving over time.
ATHENA will be trained in a similar way, but using the ‘language’ of proteins. However, rather than just text, researchers will use multiple types of data from proteins, including their sequence (the order of amino acids), three dimensional structures (how proteins are shaped), dynamics (how they move), and functional information (what they do).
“This is like building an AI with text, images and videos all at once. The different types of data will help ATHENA understand and work with proteins in a way that isn’t possible right now, making it much more versatile and powerful in designing new proteins with specific properties,” says Dr. Ferruz.
The research team will use a technique called reinforcement learning to build ATHENA. This is an approach which closely mirrors humans learning from experience, allowing the model to improve from feedback and iteratively improve its protein designs based on laboratory experiments, making each subsequent design more likely to succeed.
For instance, ATHENA could create a protein with an enhanced ability to capture carbon dioxide. The proteins are then synthesized and tested in a laboratory setting. If a protein performs well, the AI is “rewarded”, while failures will lead to adjustments that stop the model from making the same mistakes again.
One of the challenges in AI is that models usually operate as black boxes, meaning we don’t know exactly how they make decisions. One of the unique traits of ATHENA is that the research team will design it using ‘explainable AI’, a process which makes the system more transparent and understandable to humans.
“We want to be able to see inside the AI model to understand how it makes decisions, rather than just accepting its outputs. This is important because it allows people to trust the technology, learn from it, and ensure it is making decisions for the right reasons. This will be one of the biggest challenges in the project,” concludes Dr. Ferruz.
The ATHENA project is a five-year initiative backed by a 1.5 million euro Starting Grant from the European Research Council (ERC), the premier research funding initiative set up by the European Union. The ERC announced 494 projects in its latest funding round today, 33 of which going to researchers based at instituted in Spain.
Image: Dr. Noelia Ferruz at the Centre for Genomic Regulation (CRG) has begun work to build ATHENA, a new AI model which can create proteins with custom properties that do not exist in nature.