Poster #P47




Spying On Molecules: Constructing Novel Compound Spaces via Molecular Triangulation

Stiv Llenga, Ganna Gryn’ova



Since the introduction of machine learning in chemistry, the notion of chemical space has evolved from encompassing almost everything that chemistry touches to a relatively well-defined concept: an infinitely dimensional space populated by an infinite number of compounds. Molecular representation defines not only the dimensionality and shape of the compound space, but also the relationships between various compounds within this space. Two key philosophies of exploring the chemical space exist: interpolating from specific datasets within confined areas or extrapolating across the entire space. The latter approach, by definition, remains elusive.

Recently, our group has developed a novel technique for constructing the compound space using triangulation and trilateration, which are at the heart of technologies such as GPS and location tracking. In this presentation, we demonstrate the superiority of our method, called matrix of reference similarity (MRS), for performing machine learning on large newly developed and existing datasets and achieving high accuracy in predicting several common electronic properties of chemical compounds. The application of MRS as both a dimensionality reduction technique and an input for supervised or unsupervised machine learning models is exemplified not only in chemistry but also in other array-like inputs. We also discuss the computational benefits of this technique in terms of time, memory, and power.


Figure 1. The chemical space of the compounds in the N-HPC-1X dataset is partitioned into a fragment constellation and a chemical universe. By utilising the similarity between compounds in the fragment constellation from the chemical universe, straightforward and elegant mathematical tricks are used to generate relative compound spaces with dimensions smaller than the original space.






 Stiv Llenga

  •   Heidelberg Institute for Theoretical Studies (HITS gGmbH), 69118 Heidelberg, Germany
  •   Interdisciplinary Center for Scientific Computing, Heidelberg University, 69120 Heidelberg, Germany