The past few years have seen a rapid increase in the use of machine learning (ML) approaches to the fields of chemistry and materials science, in particular in the prediction of physical and chemical properties of existing and novel compounds. Databases of experimental structures—in particular, crystalline structures—continue to grow at a steady pace and are complemented with larger and larger databases of physical and chemical properties. We present here several examples of a multi-scale computational methodology to this problem, by combining the existing tools of theoretical chemistry (i.e., quantum chemical calculations and classical molecular simulations) with statistical learning approaches.[1]
We show how these have been integrated together in our group and allow not only the prediction of properties, but also a deeper understanding of the structure/property relationships that can provide chemical insight.[2] We also highlight the typical limitations of these approaches, in terms of quality and size of datasets, as well as accuracy and reproducibility of the methods. We highlight these effects on three different types of properties of nanoporous materials:
 François-Xavier Coudert