Into the Unknome: Scientists at MRC LMB in Cambridge create database ranking human proteins by how little we know about them
Scientists at the MRC Laboratory of Molecular Biology (LMB) in Cambridge have taken a leap into the unknown – or rather the Unknome – to show just how little we know about thousands of proteins that make up our cells and tissues.
Despite extensive research since the release of the first draft of the human genome sequence in 2000, the functions of many of these proteins remain a mystery.
They could be involved in areas of biological function that we have yet to explore – but research tends to focus on proteins that are well understood already.
Researchers in Sean Munro’s group in the LMB’s Cell Biology Division have now created the Unknome database of all 20,000 or so human proteins which our DNA contains instructions to make. It ranks them based on how little is known about them.
It was built by Tim Stevens, a senior investigator scientist working with Sean, and statistical analysis was performed by Rajen Shah from the University of Cambridge.
Beginning with a list of all human proteins, they collected the information that is available about their function or the function of the closely related proteins from model organisms such mice, flies or yeast.
They assigned each protein a ‘knownness’ score, depending on the quantity of available knowledge, and have made the resulting database both publicly available – at http://unknome.org – and customisable.
The value of the database was then assessed by João Rocha, Satish Arcot Jayaram and Nadine Muschalik, working with Matthew Freeman – a former LMB group leader who is now head of the Sir William Dunn School of Pathology at the University of Oxford – by performing functional screens on a subset of proteins in the database.
They selected 260 genes in humans for which there is almost no knowledge of function, but which have comparable genes in fruit flies.
Using RNA interference techniques, they removed the corresponding proteins from fruit flies and found that more than a quarter are essential for flies to live.
Further screens showed that a significant fraction of the remaining proteins contribute to important functions, such as fertility, development, tissue growth, protein quality control or stress resistance.
The work suggested that significant and unexplored biology is encoded in neglected parts of the proteome – our set of proteins. LMB researchers hope the Unknome database offers a method to focus on the proteins and a valuable resource for guiding biological studies.
Reporting their work in PLOS Biology, they say: “Analysis of publication trends has revealed that research efforts continue to focus on genes and proteins of known function, with similar trends seen in gene and protein annotation databases.
“This is despite clear evidence from studies of gene expression and genetic variation that many of the poorly characterised proteins are linked to disease, including those that are eminently druggable.
“Indeed, it has long been argued that ignorance can drive scientific advance.”
They offer a number of reasons why research has typically focused on proteins for which there is existing knowledge.
“Clearly, funding and peer-review systems are more likely to support research on proteins with prior evidence for functional or clinical importance, and individual perception of project risk seems likely to also contribute,” the researchers say.
“In addition, scientific factors have been proposed, including a lack of specific reagents like antibodies or small molecule inhibitors, and a tendency to focus on proteins that are abundant and widely expressed and so likely to be present in cell lines and model organisms. Finally, some genes may have roles that are not relevant to laboratory conditions.”
And they warn: “This inadvertent neglect of the unknown is clear and does not appear to be diminishing. This has led to concern that important fundamental or clinical insight, as well as potential for therapeutic intervention, is being missed.”
Noting that unlike most databases, this one should shrink over time as we finally learn more about human proteins, they conclude: “We find that accurately evaluating ignorance about gene function provides a valuable resource for guiding biological studies and may even be important for determining strategies to efficiently fund science.”