It seems that some of the original stereochemistry is lost while test data was converted/processed. For example, compound **H89** (_O=S(C2=CC=CC1=CN=CC=C12)(NCCNC/C=C/C3=CC=C(Br)C=C3)=O_) is listed with InChI key **ZKZXNDJNWUTGDK-NSCUHMNNSA-N**. However, this is wrong, as this SMILE string corresponds to **ZKZXNDJNWUTGDK-UHFFFAOYSA-N** Same for compound **GW445017X** (_CNC(=O)c1cncc(\C=C\c2ccccc2Cl)c1_) etc. I wonder if this stereochemistry could be restored, rather than googling for correct SMILEs by hand:)

Created by Olexandr Isayev olexandr
Hi olexandr, It would be great to mention in the thread as how you have answered the question and whether the answered provided above is right. As other participants might face a similar question. Guru
Please nevermind, we figured it out. Thanks for answering. This topic could be closed/deleted.
Hi Olexandr, To address this issue, it would be great if you could let me know how the InchiKey "**ZKZXNDJNWUTGDK-UHFFFAOYSA-N**" and the original stereochemistry of the compound were obtained and elucidated. Firstly, all the compound SMILES provided in the round 1 dataset are of Canonical version embedding the original stereochemistry of the compound, this was done so the participating candidates can use ?One Standard? SMILES string denotation when traversing across different datasets (DTC, CHEMBL, IUPAHR, ChemSPIDER etc.) Secondly, the compound InchiKeys were generated from the SMILES string (provided in round 1 data) using standard Rational Discovery cheminformatics Kit (RDKit). And since only Canonical SMILES were used for the process, all cheminformatics packages (RDKit, CDKit, Open BABEL etc.) would result in the same InchiKeys, same as those provided in round 1 dataset. Answering the specific examples that you have highlighted: **Compound H89:** SMILES with original stereochemistry|O=S(C2=CC=CC1=CN=CC=C12)(NCCNC/C=C/C3=CC=C(Br)C=C3)=O|Inchikeys with stereochemistry|ZKZXNDJNWUTGDK-NSCUHMNNSA-N ||| SMILES without original stereochemistry|O=S(C2=CC=CC1=CN=CC=C12)(NCCNCC=CC3=CC=C(Br)C=C3)=O|Inchikeys without stereochemistry|ZKZXNDJNWUTGDK-UHFFFAOYSA-N ||| The Inchikeys provided in the round 1 dataset for H89 also corresponds to various database information i.e. CHEMBL DB: [CHEMBL104264](https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL104264) & PubChem DB: [449241](https://pubchem.ncbi.nlm.nih.gov/compound/449241#section=Top) **Compound GW445017X:** SMILES with original stereochemistry|CNC(=O)c1cncc(\C=C\c2ccccc2Cl)c1|Inchikeys with stereochemistry|HVTWLTGGYKBXHX-VOTSOKGWSA-N ||| SMILES without original stereochemistry|CNC(=O)c1cncc(C=Cc2ccccc2Cl)c1|Inchikeys without stereochemistry|HVTWLTGGYKBXHX-UHFFFAOYSA-N ||| The Inchikeys provided in the round 1 dataset for GW445017X also corresponds to various database information i.e. CHEMBL DB: [CHEMBL218970](https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL218970) & PubChem DB: [44418556](https://pubchem.ncbi.nlm.nih.gov/compound/44418556#section=Top) The **?/? and ?\?** provided in the SMILES strings (all compounds in round 1 dataset) denote the inherent/original stereochemical property of the compounds. It seems that you have ignored the original stereochemical property of the compound **H89** when preforming the analysis hence instead of ?**ZKZXNDJNWUTGDK-NSCUHMNNSA-N**? to have ?**ZKZXNDJNWUTGDK-UHFFFAOYSA-N**?, this is same for compound **GW445017X**. Therefore, the SMILES that are provided in the round 1 dataset are the correct SMILES with the stereochemical property embedded. Hope this helps. Please let me know if you have difficulties.

Missing stereochemistry in Round1 data page is loading…