One millionth structure enters database at The Cambridge Crystallographic Data Centre
Safely stored in a building in Cambridge’s Union Road is a repository of data of global significance.
Containing vital information used in the creation of new materials, drugs and agrochemicals, the Cambridge Structural Database (CSD) is drawn upon by scientists in 70 countries around the world.
They use this highly curated library of organic and metal-organic crystal structures to determine how molecules behave and interact in three dimensions.
Compiled and managed by the Cambridge Crystallographic Data Centre (CCDC), the database has just reached a major milestone: its one millionth structure.
It is an achievement 54 years in the making and symbolises the growing importance of big data.
The centre’s chief executive officer, Dr Jürgen Harter, said: “This is truly an important milestone not only for CCDC but also for the wider scientific community.
“In addition to the value that lies in large sets of data like this to help scientists inform their research and decision making, we also pride ourselves on the high quality of the data, a result of the hard work of our expert in-house database team.
“Maintaining a policy of strict data interrogation ensures the value of the plentiful insights that can be drawn from the CSD, avoiding misinformation that can lead to wasted time, resources and ultimately cost.”
The centre is self-administering, but retains close links with the University of Cambridge as one of its partner institutions.
It grew out of the activities of the crystallography group led by Dr Olga Kennard at the Department of Organic, Inorganic and Theoretical Chemistry.
In 1965, the group began collecting published bibliographic, chemical and crystal structure data for small molecules studied by X-ray or neutron diffraction.
Encoding this in electronic form as computing began to develop, the CSD was born, becoming one of the first numerical scientific databases anywhere in the world.
With government-backed grants, the database and software to interrogate it was developed and the first releases to the US, Italy and Japan occurred in the 1970s.
By the following decade, the system was being distributed to 30 countries, with growing interest – and income – from the pharmaceutical and agrochemical industries enabling the centre to become an independent company in 1987, with the status of a non-profit charitable institution
Today, machine learning and artificial intelligence are enabling pharma and other industries to automate more of their processes, but the results are only as useful as the data on which they rely.
The CSD promises extensive validation and cross-checking before one of these experimentally-determined structures is allowed into the database.
The structures can be viewed as three-dimensional models and are enriched with bibliographic, chemical and physical property information.
Advanced search, 3-D data mining, analysis and visualisation software from the centre enables scientists to extract insights and predict new outcomes.
Pharma companies are using these resources to help them predict drug morphology – the shape of a particle has a significant impact on its effectiveness and how it can be used. Spherical particles, for example, are ideal for use in inhalers, providing they are within a certain size range.
The database is also becoming increasingly valuable as we seek more sustainable and environmentally-friendly solutions for everyday materials.
Researchers working on new materials for batteries, paints, pigments and dyes, and particularly those working on gas storage and tailored catalysts, are turning to the database.
The CSD allows researchers to publish data that would otherwise have remained hidden from the world, meaning it contains some unique information.
In 2018, the highest number of deposits into the database came from crystallographers in China, with the UK in sixth, reflecting a trend the centre has seen in recent years.
“It is an exciting time for life science and materials development research with markets such as China leading the way in scientific discovery. We are excited to see what insights we obtain from this market going forward,” Dr Harter said.
And it was from China that the database’s one millionth structure came.
1-(7,9-diacetyl-11-methyl-6H-azepino[1,2-a]indol-6-yl)propan-2-one – we’ll test you on this later – is an N-heterocycle produced by a chalcogen bonding catalyst activating multiple reactions steps sequentially.
The structure was determined by Yao Wang and co-authors from Shandong University in China and published in the Journal of the American Chemical Society (JACS).
Should you wish to explore it, enter the reference code XOPCAJ in the database at ccdc.cam.ac.uk/structures/.
Suzanna Ward, head of the CSD, said: “We’d like to congratulate Yao Wang and all of his co-authors for publishing the millionth structure, and we are so grateful to the 350,000-plus scientists from around the world that have contributed their data, enabling us to reach this milestone and placing CSD as the go-to resource for structural information within the scientific community.”
Dr Wang said: “We are delighted to hear that our structure is the millionth to enter the CSD. We have used the CSD for over 10 years because it is an excellent platform to report new crystal structures and an outstanding database to find inspirable chemical structures.
“It is a valuable resource to us and to many other scientists around the world so we are very proud to be associated with this milestone for the community.’
Peter Stang, editor-in-chief of JACS, said: “We know our readers value the CSD as a trusted repository of structural data and some of our authors have demonstrated how this rich resource can accelerate scientific research.
“Our continued collaboration with the CCDC helps make this wealth of data more accessible to the community as well as helping us ensure the integrity of data published in our journals and we are proud to be associated with
such a significant milestone in structural chemistry.”
Now, what was that chemical again?