Talk #D2.09

22.05.2024, 16:30 – 17:00





Data is a Girl’s Best Friend: From High-Throughput Computations to Generative Deep Learning

Renana Gershoni-Poranne



Polycyclic aromatic systems (PASs) are among the most prevalent and impactful classes of compounds in the natural and man-made worlds. Though aromatic systems have captured the fascination of chemists for almost two centuries, a general conceptual framework for understanding and predicting the structure-property relationships of polycyclic systems remains elusive. We address this gap using a combination of computational chemistry and data science tools. We established the COMPAS Project—a COMputational database of Polycyclic Aromatic System—which already contains over 500k molecules in three datasets: cata-condensed polybenzenoid hydrocarbons (COMPAS-1),1 cata-condensed hetero-PASs (COMPAS-2),2 and peri-condensed polybenzenoid hydrocarbons (COMPAS-3).3 With COMPAS hand, we demonstrate the first cases of interpretable learning models in the chemical space of PASs. To this end, we developed two types of molecular representation: a) a text-based representation4 and b) a graph-based representation,5 which not only achieve higher predictive ability with fewer data, but are also amenable to interpretation – thus allowing the extraction of chemical insight from the model.Using the COMPAS database and our dedicated representations, we implemented the first guided diffused-based model for inverse design of PASs: GaUDI.6 Our model generates new PASs with defined target properties. In addition to its flexible target function and high validity scores, GaUDI also accomplishes design of molecules with properties beyond the distribution of the training data.


  1. Wahab, A.; Pfuderer, L.; Paenurk, E.; Gershoni-Poranne, R. The COMPAS Project: A Computational Database of Polycyclic Aromatic Systems. Phase 1: Cata-Condensed Polybenzenoid Hydrocarbons, J. Chem. Inf. Model. 2022, 62 (16), 3704.
  2. Mayo Yanes, E.; Chakraborty, S.; Gershoni-Poranne, R. COMPAS-2: A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems, Sci. Data 2024, 11 (1), 97.
  3. Wahab, A.; Gershoni-Poranne, R. COMPAS-3: A Data Set of Peri-Condensed Polybenzenoid Hydrocarbons, ChemRxiv February 26 2024.
  4. Fite, S.; Wahab, A.; Paenurk, E.; Gross, Z.; Gershoni-Poranne, R. Text-Based Representations with Interpretable Machine Learning Reveal Structure-Property Relationships of Polybenzenoid Hydrocarbons, J. Phys. Org. Chem. 2022, e4458.
  5. Weiss, T.; Wahab, A.; Bronstein, A. M.; Gershoni-Poranne, R. Interpretable Deep-Learning Unveils Structure–Property Relationships in Polybenzenoid Hydrocarbons, J. Org. Chem. 2023, 88 (14), 9645–9656.
  6. Weiss, T.; Mayo Yanes, E.; Chakraborty, S.; Cosmo, L.; Bronstein, A. M.; Gershoni-Poranne, R. Guided Diffusion for Inverse Molecular Design, Nat. Comput. Sci. 2023, 3 (10), 873–882.





Renana Gershoni-Poranne

 Renana Gershoni-Poranne


  •   Israel Institute of Technology