Poster #P41




Quantum-mechanical exploration of conformers and solvent effects in complex molecular systems

Leonardo Medrano Sandonas, Li Chen, Alessio Fallani, Mirela Puleva, Mathias Hilfiker, Dries Van Rompaey, Alexandre Tkatchenko, Gianaurelio Cuniberti



In pharmaceutical research and development, computational chemistry can play an integral role in expediting candidate drugs into the clinic. Particularly, quantum- mechanical (QM) methods have been utilized to describe covalent and non-covalent interatomic interactions and to estimate diverse physicochemical properties of molecular systems. However, the computational cost and the challenge of conducting QM calculations at a large scale present a limitation to their widespread use in drug discovery pipelines. Accelerated QM methods (e.g., semi-empirical methods, machine learning (ML) models) have emerged as promising solutions in recent years, offering a balance between accuracy and computational efficiency. The resulting acceleration enables researchers to include QM-based knowledge as a part of their workflow. In this sense, we previously introduced QM7-X dataset, a relevant QM dataset of small organic molecules with up to seven heavy atoms, that has assisted the development of ML-based approaches for a fast and accurate estimation of diverse physicochemical properties as well as the targeted design of organic molecules [1-5]. However, QM7- X molecules are considerably smaller than what is commonly encountered in modern medicinal chemistry, limiting the exploration of the chemical space corresponding to large drug-like molecular complexes. To relax these limitations, we have recently introduced Aquamarine (AQM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 and containing up to 54 non-hydrogen atoms [6]. Here, two different AQM subsets were generated, namely AQM-gas and AQM-sol, which contain the QM structural (DFTB3+MBD) and property data (PBE0+MBD) of molecules in gas phase and implicit water, respectively. Additionally, we have explored the chemical space of non-covalent systems composed of 118 large molecular receptors and 22 small odorants for body odor volatilomes (BOVs) detection. As such, we expect that these datasets serve as challenging benchmarks for state-of-the-art ML methods for property modelling and de novo generation of large (solvated) molecular systems with pharmaceutical and biological relevance.


  1. J. Hoja, L. Medrano Sandonas et al., Sci. Data 8 43, (2021).
  2. L. Medrano Sandonas et al., Chem. Sci. 14, 10702–10717, (2023).
  3. S. Góger, L. Medrano Sandonas, C. Müller, and A. Tkatchenko, Phys. Chem. Chem. Phys. 25 22211–22222, (2023).
  4. M. Stöhr, L. Medrano Sandonas, and A. Tkatchenko, J. Phys. Chem. Lett. 11 6835–6843, (2020).
  5. A. Fallani, L. Medrano Sandonas, and A. Tkatchenko, arXiv (2023). 10.48550/arXiv.2309.00506.
  6. L. Medrano Sandonas et al., ChemRxiv (2024). 10.26434/chemrxiv-2024-685qb.





 Leonardo Medrano Sandonas

  •   Institute for Materials Science and Max Bergmann Center of Biomaterials