‘A revolution for life sciences’ - DeepMind and EMBL-EBI release database of human protein structures
It has been described as the most important dataset since the human genome was mapped.
DeepMind, the Google-owned artificial intelligence lab, has teamed up with the EMBL’s European Bioinformatics Institute (EMBL-EBI) to create an accurate database of protein structures in the human body, known as the human proteome.
EMBL director general Edith Heard described this as “truly a revolution for the life sciences, just as genomics was several decades ago”.
DeepMind founder and CEO Demis Hassabis described it as “the most significant contribution artificial intelligence has made to advancing scientific knowledge”.
A better understanding of protein structures will be pivotal in developing new drugs to treat anything from dementia to cancer to Covid-19. Proteins are the building blocks of life, underpinning every biological process in every living thing.
The breakthrough was achieved using DeepMind’s AI program, called AlphaFold, which a year ago was able to crack one of the biggest challenges facing biologists in the last 50 years - namely how proteins fold into 3D shapes. It is able to predict these structures with confidence from their amino acid sequences.
It builds on the work of generations of scientists who have used protein imaging and crystallography to solve these sequences.
The freely available new database features 20,000 human protein structures, more than doubling the number previously available to researchers.
“Our goal at DeepMind has always been to build AI and then use it as a tool to help accelerate the pace of scientific discovery itself, thereby advancing our understanding of the world around us,” said Dr Hassabis. “We used AlphaFold to generate the most complete and accurate picture of the human proteome. We believe this represents the most significant contribution AI has made to advancing scientific knowledge to date, and is a great illustration of the sorts of benefits AI can bring to society.”
The team also released around 350,000 structures from 20 additional organisms important for biological research, such as E.coli, fruit fly, mouse, zebrafish, malaria parasite and tuberculosis bacteria.
AlphaFold has already demonstrated its value, with the Drugs for Neglected Diseases Initiative (DNDi) using it for research into life-saving cures for diseases that disproportionately affect the poorer parts of the world - namely Chagas disease and Leishmaniasis, which are caused by parasites.
The Centre for Enzyme Innovation (CEI) is using AlphaFold to engineer faster enzymes for recycling some of our most polluting single-use plastics.
And a team at the University of Colorado Boulder is using the technology to study antibiotic resistance, while a group at the University of California San Francisco has aided our understanding of the biology of the Covid-19 virus, SARS-CoV-2, using AlphaFold.
Ewan Birney, deputy director general of the EMBL and director of the Wellcome Genome Campus-based EMBL-EBI, said: “Making AlphaFold predictions accessible to the international scientific community opens up so many new research avenues, from neglected diseases to new enzymes for biotechnology and everything in between.
“This is a great new scientific tool, which complements existing technologies, and will allow us to push the boundaries of our understanding of the world.”
The database will be updated, with the aim of visualising more than 100 million protein structures.
Sundar Pichai, CEO, Google and Alphabet, said: “The AlphaFold database shows the potential for AI to profoundly accelerate scientific progress. Not only has DeepMind’s machine learning system greatly expanded our accumulated knowledge of protein structures and the human proteome overnight, its deep insights into the building blocks of life hold extraordinary promise for the future of scientific discovery.”
The MRC Laboratory of Molecular Biology’s Venki Ramakrishnan, the 2009 Nobel Laureate for Chemistry, said: “This computational work represents a stunning advance on the protein-folding problem, a 50-year old grand challenge in biology. It has occurred long before many people in the field would have predicted. It will be exciting to see the many ways in which it will fundamentally change biological research.”
The database can be found at https://www.alphafold.ebi.ac.uk/ and the research was published in Nature.
Read more
Chancellor announces £45m for EMBL European Bioinformatics Institute to aid life science research
Sign up for our weekly newsletter and stay up to date with Cambridge life sciences