Fragrance Space With Generative Models

Fragrance Space with Graph Generative Models and Odor Prediction

- We introduce a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness and classify probable odor labels. We show that odor likeliness is a function of physicochemical features.

Hero Image

Cheminformatic

Cheminformatic Pipeline Statistical analysis

Models

Advanced Generative Models opensource

Dataset

Trained on 5000 molecules. curated dataset

About

The whole process involves four key stages: molecule generation, stringent sanitization checks for molecular validity, fragrance likeliness screening and odor prediction of the generated molecules.

  • The development of an integrated framework that incorporates molecule generation, stringent molecular validation, odor likeliness screening and odor prediction.
  • The construction and interpretability analysis of the odor likeliness equation, along with comparison with other fragrance criteria.
  • An extensive analysis of embedding space of generated molecules and the odor labels predicted for them.
Business Meeting

Features

Leveraging generative models and graph neural networks to efficiently navigate the fragrance space.

Odor Likeliness

The Odor-likeliness equation is derived based on physicochemical properties of the molecules in the curated dataset.

  • Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance.
  • SHAP (SHapley Additive exPlanations) to identify important features.
  • VIF (Variance Inflation Factor) and Corelation coefficient to remove highly correlated features.

Benchmarks

The benchmark from the MOSES paper was used to evaluate the models. The Odor-likeliness equation was also compared with other fragrance likeliness criteria such as GDB-17 Criteria, Rule of Three, Fragrance-Like Property. Benchmarks Validity Uniqueness Novelty Diversity Score Scaffold Similarity Similarity to Nearest.

  • validity (adherence to chemical rules),
  • Uniqueness (absence of duplication among the generated SMILES),
  • Novelty (proportion of generated molecules absent from the training set),
  • Scaffold similarity (Scaff)
  • Similarity to a nearest neighbor (SNN).
  • comparision of odor likliness equation with other fragrance likliness criterion.
  • Similarity to a nearest neighbor (SNN).

Statistical Analysis

Several statistical tests have been performed which confirm the properties of the generated molecules and the original molecules differ in Molecular weight, LogP, Fraction of Sp2 hybridized atoms and FCFP4 count.

  • KS statistics along with p-value.
  • Jensen Shannon divergence.
  • Violin Plots.
  • PCA of Fingerprints ( MACCS Keys, Morgan Fingerprints, Atom – Pair Fingerprints ).

Odor Prediction

The structure-odor relationship is complex. Odor prediction of the generated molecules was performed using Graph Neural Network.

  • Analysis of odor category distribution.
  • Highlights the odor labels generated by different generative models.
  • Discusses relationship between benchmark and predicted odor labels.

Contributors

CSIR-Central Scientific Instruments Organisation(CSIO)

Contributors

Mrityunjay Sharma,

CSIR-CSIO, Chandigarh, India

Sarabeshwar Balaji,

Indian Institute of Science Education and Research Bhopal(IISERB), India

Pinaki Saha,

University of Hertfordshire, UH Biocomputation Group,United Kingdom

Ritesh Kumar,

CSIR-CSIO, Chandigarh, India