Fragrance Space with Graph Generative Models and Odor Prediction
- We introduce a suite of generative modelling techniques to efficiently navigate and explore the complex landscapes of odor and the broader chemical space. Unlike traditional approaches, we not only generate molecules but also predict the odor likeliness and classify probable odor labels. We show that odor likeliness is a function of physicochemical features.

Cheminformatic
Cheminformatic Pipeline Statistical analysis
Models
Advanced Generative Models opensource
Dataset
Trained on 5000 molecules. curated dataset
About
The whole process involves four key stages: molecule generation, stringent sanitization checks for molecular validity, fragrance likeliness screening and odor prediction of the generated molecules.
- The development of an integrated framework that incorporates molecule generation, stringent molecular validation, odor likeliness screening and odor prediction.
- The construction and interpretability analysis of the odor likeliness equation, along with comparison with other fragrance criteria.
- An extensive analysis of embedding space of generated molecules and the odor labels predicted for them.

Features
Leveraging generative models and graph neural networks to efficiently navigate the fragrance space.
Odor Likeliness
The Odor-likeliness equation is derived based on physicochemical properties of the molecules in the curated dataset.
- Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance.
- SHAP (SHapley Additive exPlanations) to identify important features.
- VIF (Variance Inflation Factor) and Corelation coefficient to remove highly correlated features.

Benchmarks
The benchmark from the MOSES paper was used to evaluate the models. The Odor-likeliness equation was also compared with other fragrance likeliness criteria such as GDB-17 Criteria, Rule of Three, Fragrance-Like Property. Benchmarks Validity Uniqueness Novelty Diversity Score Scaffold Similarity Similarity to Nearest.
- validity (adherence to chemical rules),
- Uniqueness (absence of duplication among the generated SMILES),
- Novelty (proportion of generated molecules absent from the training set),
- Scaffold similarity (Scaff)
- Similarity to a nearest neighbor (SNN).
- comparision of odor likliness equation with other fragrance likliness criterion.
- Similarity to a nearest neighbor (SNN).

Statistical Analysis
Several statistical tests have been performed which confirm the properties of the generated molecules and the original molecules differ in Molecular weight, LogP, Fraction of Sp2 hybridized atoms and FCFP4 count.
- KS statistics along with p-value.
- Jensen Shannon divergence.
- Violin Plots.
- PCA of Fingerprints ( MACCS Keys, Morgan Fingerprints, Atom – Pair Fingerprints ).

Odor Prediction
The structure-odor relationship is complex. Odor prediction of the generated molecules was performed using Graph Neural Network.
- Analysis of odor category distribution.
- Highlights the odor labels generated by different generative models.
- Discusses relationship between benchmark and predicted odor labels.

Contributors
CSIR-Central Scientific Instruments Organisation(CSIO)