Hello.
I have a question regarding data.
**CID.csv was understood as a list of CIDs used for all mixtures.**
**However, Snitz 2 Dataset and Bushdid Dataset have CIDs that are not in CID.csv.** Accordingly, we discovered that there were the following errors in Mixtures 9, 10, and 13 of the Snitz 2 Dataset, and it was confirmed that **81035281168** among the CIDs of Mixture 9 was **two CIDs, 8103 and 5281168** (Refer to the photo file linked below).
So I want to make sure that the Dataset is correct.
_1. Is this a Snitz 2 Dataset error like the above?_
_2. If the error in 1 is correct, can CIDs that are not in CID.csv of Bushdid Datset also be considered an error? If there are errors, please republish the dataset (I checked on GitHub and it is the same as the publicly available dataset)._
Additionally, CIDs that are not in CID.csv are molecules that exist. Therefore, we request confirmation of the data.
Thank you.
Snitz 2_Dataset_in_paper -https://drive.google.com/file/d/1PztxbkcALNazdsRb5aas_9ZdbqKSV5l6/view?usp=sharing
Snitz 2_Dataset_in_csv - https://drive.google.com/file/d/1KO2_lp1Tf8MeV_n-1G06gyUtalmPIX_U/view?usp=sharing
Created by MyungJin Lee jinny Thank You :)
I just downloaded the data again.
And are there any errors with the Bushdid Dataset?
CIDs are 25137858, 19789253, 78605, 66328. Hi you are correct, there was a parsing error and 81035281168 among the CIDs of Mixture 9 are in fact CIDs 8103 and 5281168.
The training set was updated yesterday to reflect this changes.
Thanks
Pablo