I don't see a column-wise description for the data in DTC_data.csv. Is this posted somewhere? It would also be nice to have a description of terms used in the data. I'm familiar with IC50 and KD, but what are (ABSAC1, KI, PKB, TDI, etc)? Is part of the challenge to use these other metrics to infer KD or are these synonymous with KD?
Created by Jeremy Jacobsen jjacob_cub Hi Xiaokang,
No, I haven?t used bioactivies as features.
Hi David,
Potency refers to the amount of drug needed to produce a certain effect. Unlike Kd, IC50 is also a measure of potency.
However, DTC contains also other drugs than kinase inhibitors and other protein targets than kinases.
Perhaps many bioavtivity types in DTC annotated as ?potency? might correspond to those other protein targets, and the exact definition of the ?effect produced? might differ. But, I think that, in general, the lower the potency or IC50, the more active a drug is.
I hope this is at least a bit helpful. Hi Anna,
Other bioactivities, such POTENCY, IC50, KI, can be used as features when predicting kd. However, in the DTC dataset, for the combinations of a compound and a protein with a Kd value, most of them (more than 86%) don't have other bioactivities measured. So I was wondering whether you used these bioactivities as predictive features.
So for the "POTENCY" type it seems usually no Pubmed ID is listed in DTC. It seems a lot of this data came from PubChem and is from qHTS (quantitative high throughput screening). e.g. "PubChem BioAssay. qHTS Assay for Inhibitors of MBNL1-poly(CUG) RNA binding. (Class of assay: confirmatory)". I think this is the paper describing qHTS:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1518803/
Do you know how these values relate to Kd or IC50? Hi David,
Yes indeed, these activity measures are inherent standards when performing biochemical assays. Hence defining their emuneration through one paper, would apply to all the data points were such an activity type was reported.
Thanks Guru.
So for "POTENCY"/"INHIBITION"/"ACTIVITY" you would suggest looking at the associated papers to see what these mean? Hi David,
Thanks a lot for your suggestion, currently DTC doesn?t provide a description glossary for the various bioactivity types present in the databases, we will surely include it in the future version. DTC serves as a comprehensive community-driven resource for collecting and curating published bioactivity information. There are ~1500 unique bioactivity types and the exact description and use can be found by reviewing the research article where the activity was reported (PubMed ID). To address your specific question Kd vs Kdapp (Pubmed_id: 29191878). Though in most research articles both the terminologies are used interchangeably.
Following reaction kinetics equation while considering E as the enzyme and I as inhibitor:
[EI] -> [E] + [I]
Kd = [E][I] /[EI]
Kdapp = Kd/[I]2
In other words,
Kd is the enzyme dissociation constant. The rate at which the enzyme and Inhibitor complex dissociated as E and I.
Whereas, Apparent Kd, is the rate associated with each enzyme-inhibitor complex with respect to the specific Inhibitor.
Kdapp = Kd/p
Where p is the number of Inhibitors per each molecule [EI] complex.
Hope this was helpful.
Thanks Anna, is this is really helpful. This is a count table of the most frequent standard types in the DTC data:
POTENCY 2956515
IC50 1001588
KI 620898
INHIBITION 442241
ACTIVITY 246113
EC50 141463
AC50 107442
KD 75594
Do you know what any of the other common quantifications are? KDAPP is also quite frequent (6450 entries), how is that different from KD? Kd (KD) is a dissociation constant which directly measures a strength of physical interaction between two molecules; here, a binding affinity between compound and kinase.
In particular, Kd indicates the concentration of a compound at which 50% of a target kinase exists in a compound?kinase complex.
An inhibition constant Ki and a half maximal inhibitory concentration IC50, on the other hand, correspond to the concentration of a compound needed to inhibit the enzymatic activity of a kinase by 50%.
The relationship between Ki and IC50 depends on the type of the inhibition and the mechanism of the reaction. Ki equals IC50 assuming either non-competitive or uncompetitive model of inhibition.
However, most kinase inhibitors are competitive with the ATP substrate. If the inhibitor is a competitive inhibitor, the relationship between Ki and IC50 is characterised by a Cheng-Prussoff equation.
Kd, Ki and IC50 are all considered as indicators of compound activity. However, unlike Kd, Ki and IC50 measure the potency of a compound (i.e., the amount of a compound needed to produce a certain effect). Therefore, a compound that has a low Ki or IC50 might not have a low Kd.
I hope this helps! Hi again,
Thanks for your reply. The reason I asked you about this is there are only 77566 examples with "kd" or "KD" as bioactivity values in the DTC. So my training size reduces drastically. Also, can you suggest some resource to understand what each label signifies? As in, what is the difference between KI, KD, KD' etc? I am a non biologist and would really love to know about the dataset in general. What each column or values in them mean? Hi,
Using Kd values only is definitely one way to approach the problem.
However, we strongly encourage the participants to explore if and how other bioactivity types could be used for model training. Unfortunately, a gold standard solution to this problem does not exist.
I don't think there is an equation that allows converting, e.g., Ki to Kd. But you could try to take advantage of other bioactivity types, such as Ki and IC50 by using, for instance, a transfer learning framework.
Best,
Anna Hi Anna,
Since the metric used to measure bioactivity for this competition is "kd values", should we only consider examples which have their "standard_type" as "KD" or "Kd"? Is there any way to convert the other metrics(like Ki,IC50) to Kd? Hi Xiaokang,
A value in the column "standard_value" in DTC corresponds to Kd if a value in the column "standard_type" equals to "KD" or "Kd".
DTC contains also other bioactivity types, such as Ki, IC50 and %Inhibition.
Best,
Anna Hi Robert,
Thanks for your reply. A follow-up question about the columns in DTC: is the column named "standard_value" the Kd we are going to predict given a compound and a target?
Thanks,
Xiaokang Hi @jjacob_cub, I reached out to our collaborators who manage DTC for a more thorough answer:
>Dear Jeremy,
>Thanks for participating in the Dream Challenge, and for using the DTC data resource. To address your questions.
>1) The description of the various columns in the DTC dataset is provided in the DTC Glossary (accessed through the Help drop-down menu at https://drugtargetcommons.fimm.fi/).
>2) There are approximately 1500 standard measurement types complied in DTC. This information is obtained by collectively augmenting different bioactivity resources from different databases.
>3) The precise description of the measurement can be identified by referring to the actual publication from which the activity information was obtained. (pubmed_id column in the DTC data).
>4) I have reviewed some of the publications to answer your question of what are (ABSAC1, KI, PKB, TDI, etc)?
>ABSAC1: Absolute active concentration (ABSAC) at 1 uM, the concentration at which the curve crosses threshold of 1.0 uM.
>KI: The Ki is the concentration of competing ligand that would bind half of the receptors in the absence of any other competition
>PKB: Describe the degree of ionization, Kb is the base dissociation constant. The -log of Kb give PKB. A large Kb value indicates the high level of dissociation of a strong base. A lower pKb value indicates a stronger base.
>TDI: Time dependent inhibition of the compound towards a target.
>(Such description of the activity metrics can be obtained by referring to the actual publication-PUBMED info provided through DTC)
>5) The standard bioactivity metrics may be correlated, some more than others, and any of these can be used to predict Kd values of the compound-target pairs. The teams can naturally also use Kd values of other pairs in the prediction task, as well as drug and target resources other than DTC.
>Hope I have answered your question. Please let us know if you have any difficulties.
>Regards,
>Guru