Talk #D3.02

23.05.2024, 09:45 – 10:30





Auto-generated Materials Databases and Language Models

Jacqueline M. Cole



Data-driven materials discovery is coming of age, given the rise of 'big data' and machine- learning (ML) methods. However, the most sophisticated ML methods need a lot of data to train them. Such data may be custom materials databases that comprise chemical names and their cognate properties for a given functional application; or data may comprise a large corpus of text to train a language model. This talk showcases our home- grown open-source software tools that have been developed to auto-generate custom materials databases for a given application. The presentation will also demonstrate how domain-specific language models can now be used as interactive engines for data-driven materials science. The talk concludes with a forecast of how this 'paradigm shift' away from the use of static databases will likely evolve next-generation materials science.






Prof. Jacqueline M. Cole

 Prof. Jacqueline M. Cole


  •   Cavendish Laboratory, Department of Physics, University of Cambridge, CB3 0HE. UK
  •   ISIS Neutron and Muon Source, STFC Rutherford Appleton Laboratory, OX14 0QX. UK