Pangenomics to enter the quantum era with $3.5m project involving University of Cambridge, Sanger Institute and EMBL-EBI
An ambitious project that brings together a new domain in science - pangenomics - with the emerging power of quantum computing has been awarded $3.5million.
The researchers involved say it is an “incredibly exciting” prospect but admit that it is so cutting-edge they cannot know exactly where it will lead.
Potential benefits, however, could include advances in personalised medicine and better tracking and management of pathogen outbreaks.
Pangenomes are new representations of DNA sequences that capture genetic diversity in a population. Each pangenome is a collection of many different genome sequences.
They could be produced for all species, including pathogens such as SARS-CoV-2, the virus behind Covid-19.
But building, augmenting and analysing pangenomic datasets for large population samples demands huge levels of computational power.
So the team of researchers at the University of Cambridge, the Wellcome Sanger Institute and EMBL’s European Bioinformatics Institute (EMBL-EBI plan to develop quantum computing algorithms with the potential to speed up the production and analysis of pangenomes.
Dr David Yuan, project lead at EMBL-EBI, said: “On the one hand, we’re starting from scratch because we don’t even know yet how to represent a pangenome in a quantum computing environment. If you compare it to the first moon landings, this project is the equivalent of designing a rocket and training the astronauts.
“On the other hand, we’ve got solid foundations, building on decades of systematically annotated genomic data generated by researchers worldwide and made available by EMBL-EBI. The fact that we’re using this knowledge to develop the next generation of tools for the life sciences, is a testament to the importance of open data and collaborative science.”
It was on 21 April, 2003, that the International Human Genome Consortium announced that the human genome had been sequenced. Since then, genomics has revolutionised medicine, aiding diagnosis and informing treatments.
Our DNA code contains 6.4 billion letters and less than one per cent of them differ from one human to the next, but it is these difference that make us unique.
However, the reference human genome that most subsequent human DNA is compared to is based on DNA from only a few people, so does not represent human diversity.
Scientists have been working for more than a decade to address the problem and last year the first human pangenome reference was produced in a global collaboration that involved the Wellcome Sanger Institute and EMBL-EBI. It featured the genome sequences of 47 people, and the aim is to increase that number to 350 by mid-2024.
The existing human reference genome structure is linear but pangenome data needs to be represented and analysed as a network known as a sequence graph, which stores the shared structure of genetic relationships between many genomes.
Comparing an individual genome to the pangenome, which will give far greater insight into its unique composition, involves mapping a route for the sequence through the graph.
The team hopes to develop quantum computing techniques to speed up the process of mapping data to graph nodes and finding good routes through the graph.
Dr Sergii Strelchuk, principal investigator of the project from the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge, said: “The structure of many challenging problems in computational genomics and pangenomics in particular make them suitable candidates for speed-ups promised by quantum computing. We are on a thrilling journey to develop and deploy quantum algorithms tailored to genomic data to gain new insights, which are unattainable using classical algorithms.”
Classical computing stores information in bits, which are binary - either 0 or 1. But a quantum computer works with particles that can be in a superposition of different states simultaneously. Information is represented in quantum bits, or qubits, which can be 0 or 1 or a superposition state between 0 and 1 - something that enables solutions to problems not practical to solve on classical computers.
The problem is that existing quantum computer hardware is sensitive to noise and decoherence so scaling is an immense technological challenge. So far, quantum computers are limited in size and computational power, but significant advances are expected in the next few years.
The team will build on computational genomics methods to develop, simulate and implement new quantum algorithms using real data. They will test and refine their algorithms and methods using existing, powerful High Performance Compute (HPC) environments as simulations of forthcoming quantum computing hardware, beginning with small stretches of a DNA sequence, before processing relatively small genome sequences like SARS-CoV-2, and ultimately moving on to the much larger human genome.
David Holland, principal systems administrator at the Wellcome Sanger Institute, who is working to create the High Performance Compute environment to simulate a quantum computer, said: “We’ve only just scratched the surface of both quantum computing and pangenomics. So to bring these two worlds together is incredibly exciting. We don’t know exactly what’s coming, but we see great opportunities for major new advances. We are doing things today that we hope will make tomorrow better.”
The project is one of 12 selected worldwide for the Wellcome Leap Quantum for Bio (Q4Bio) Supported Challenge Program, which is devised with the premise that new computational methods advance best with the co-development of software and hardware.