CCSC 2024

Poster #P56

Disparity of verbal labels for molecular olfactory properties

Thomas Gorges, Teresa Scholz, Vincent Christlein

Despite the importance of the sense of smell, the structure-odor relationship is not extensively understood [1,2]. Machine learning methods provide an opportunity to gain a better understanding of the influence of the molecular structure on its odor. However, relevant datasets are typically too small to provide a training basis for these models and thus, it is desirable to combine them [3]. This is a challenging task, due to the subjective perception and resulting varying verbal descriptions of olfactory properties for the one and same substance [4].

In this work, we investigate the disparity of verbal olfactory descriptions across different data sources. Two odor datasets are combined and annotations of overlapping molecules are analyzed. By using a pretrained Natural Language Processing model, we transform annotations into an embedding space. We then examine the similarity of these embeddings across both datasets and correlate them with their corresponding molecular descriptors.

K. J. Rossiter, Chemical reviews 1996, 96(8), 3201–3240.
A. Keller, R. C. Gerkin, Y. Guan, A. Dhurandhar, G. Turu, B. Szalai, J. D. Mainland, Y. Ihara, C. W. Yu, R. Wolfinger, et al., Science 2017, 355(6327), 820–826.
A. Sharma, R. Kumar, S. Ranjta, P. K. Varadwaj, Journal of Chemical Information and Modeling 2021, 61(2), 676–688.
R. S. Herz, J. von Clef, Perception 2001, 30(3), 381–391.

Thomas Gorges

Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058 Erlangen, Germany