Kao Data discusses the growth of high performance computing in the life sciences
Adam Nethersole is director of Kao Data, an award-winning, hyperscale-inspired specialist data centre located in Harlow, engineered specifically to cater for advanced supercomputing and high performance computing (HPC) deployments, as well as intensive, high-density forms of AI and machine learning. He discusses how the pandemic has highlighted the growing importance of HPC within the life sciences sector.
Little more than nine months ago, the novel coronavirus (formally known as SARS-CoV-2) was unknown. To date, Covid-19 - the disease caused by the novel coronavirus - has claimed the lives of more than 900,000 people and changed the lives of billions as it has spread to almost every country in the world.
In the same time period, the global scientific and life sciences communities have learned an extraordinary amount about the virus: how it is spreading and how we might combat it. The extent of our knowledge, and the phenomenal speed with which it has been acquired, simply would not have been possible without high performance computing (HPC).
Given the colossal global numbers to analyse – 29.7 million cases, 7.25 million active patients, 22.5 million cases with an outcome, across 213 countries with more than 4.5 billion people living with social distancing (and the majority of these numbers changing by the minute) - there has never been a moment in time when the world needed HPC more.
HPC gives us the potential to analyse vast data sets (including almost 30,000 documents made available by the White House in March alone) to plot the path of outbreaks and pandemics like Covid-19, predict the evolution and to simulate vaccine compounds; and all at remarkable speed.
Clearly, it is often hard to quantify 'remarkable speed' when it comes to scientific discovery. So, to provide some form of comparison, consider this. After HIV-1, the main cause of AIDS in humans was first identified in 1981, it took almost three decades to genetically decode HIV-1. Four years later, in 2013, the SARS outbreak (due to another coronavirus) was decoded within three months. The genome behind Covid-19 was decoded and published globally within days.
The speed at which science, scientists and their supercomputers are now working is truly staggering - especially when you remember that they're not all focused on the same part of the puzzle at the same time. But HPC is playing a vital role at each and every stage from diagnosing new cases to predicting how and where the virus is spreading to the search for treatment options and testing drug candidates.
Spotting a pandemic
The initial challenge for clinicians and researchers in China was correctly diagnosing which patients had the disease, at a time when test kits were limited and not always reliable. Tianhe-1, China's first petascale computer, housed at the National Supercomputing Centre in Tianjin, was used to distinguish between CT scans of patients with Covid-19 pneumonia and non-Covid-19 pneumonia. With the former being highly contagious, the distinction was vitally important. Researchers reported almost 80 per cent accuracy - better than both early test kits and human radiologists.
And, as the virus took hold, it was artificial intelligence running on HPC platforms that was first to identify the spread. BlueDot, a company based in Toronto, Canada, that tracks infectious diseases, warned of a problem in late December - a full week before America's Centre for Disease Control and the World Health Organization.
Crucial to the spread of Covid-19 is that patients are infectious for several days before developing symptoms. BlueDot uses official and unofficial sources, including global airline ticketing data, to predict where infected people might be travelling. This helped health authorities plan their resources and gave governments a sense of where lockdown measures and social isolation might be necessary.
Breaking down the virus
Seven weeks after the first case of Covid-19 in the United States on 21st January, the White House set up the Covid-19 High Performance Computing Consortium. This brought together federal government, industry and academic leaders to provide access to 16 of the world's most powerful HPC resources. Combined, these armed the fight against Covid-19 with 424 petaflops, 3,800,000 CPU cores, and 41,000 GPUs (graphics processing units). All accessible and focused on one aim - to bring an end to the Covid-19 pandemic, and quickly.
Collectively that’s the computing power of almost 500,000 laptops - the scientific computing equivalent of absolute brute force - a digital, heavy-iron bulldozer smashing its way through mountains and mountains of big data.
Another issue was making sure the data to be analysed was, as much as possible, in the same place. In Europe, the Covid-19 Data Portal was set-up by EMBL-EBI and partners in April, bringing together relevant datasets for sharing and analysis in an effort to accelerate coronavirus research. It enables researchers to upload, access and analyse Covid-19 related reference data and specialist datasets as part of the wider European Covid-19 Data Platform.
NVIDIA has also provided researchers with free access to Parabricks, its gene sequencing tool, which can accelerate data analysis by up to 50 times. Meanwhile, Intel has partnered with Lenovo and BGI Genomics to investigate Covid-19 transmission patterns. Following the mutations of the virus is fundamental to finding a vaccine or treatments.
Understandably, focus on a vaccine is intense, and assessing drug candidates is one area in which HPC particularly excels. In Europe, CINECA, one of the top 20 most powerful supercomputers in the world, is part of a consortium of researchers - including the Barcelona Supercomputing Center and the Julich Supercomputing Centre - that are working on a vaccine, using code and workflows developed to research the Zika virus.
In the US, Summit, the world's most powerful supercomputer based at Oak Ridge National Labs, has been simulating 8,000 possible vaccine compounds. It found 77 candidates in just a few days - something that would have taken years by hand and months on a 'normal' computer.
Nine months down the line, there is much that still remains uncertain as the world continues to battle and cope with the virus. What is certain, however, is that high performance computing is giving us the analysis, statistics and insights to make better, more informed, quicker decisions that will, ultimately, save lives.
Finally, it's worth remembering that HPC doesn't run itself. The army of mission-critical technicians, engineers and support staff, who are quietly behind the scenes keeping the lights flashing and the servers whirling in data centres, laboratories and research institutes around the world, deserve our considerable thanks and appreciation.
Building on history to deliver best-in-class data centres
Kao Data is located in Harlow, on the site where Nobel Prize-winning physicist Sir Charles Kao made his pioneering discovery of the fibre optic cable more than 50 years ago.
The firm is creating an industry blueprint for best-in-class design for high performance computing (HPC) data centres. HPC is the domain of high-powered, incredibly fast servers, often working in parallel to solve complex mathematical problems, analyse massive datasets or replicate the flow of dynamic substances like air and water using binary code.
From sequencing genomes to algorithmic financial trading, through to aircraft and chassis design, HPC underpins innovation and industry
The firm is sponsoring the Start-up of the Year Award at the Cambridge Independent 2020 Science & Technology Awards .
Spencer Lamb, Kao Data’s vice president, says: “As businesses have become more and more reliant on technology - predominantly because of the need to have systems with 100 per cent uptime, 24/7/365 - the demand for more robust environments to house their systems has grown, and led to them outsourcing to organisations like Kao Data.
“The HPC world is no longer the sole domain of the research institutions and large universities. There is now a burgeoning demand from the artificial intelligence world, and science and technology companies are also taking advantage of doing things better and faster.
“We can be of huge benefit to the science, tech and AI start-ups born in and around Cambridge which, historically, have been somewhat compromised regarding access to technology resources and facilities such as ours. We want to be part of their journey and success too.”