Hello,
The goal of the challenge is to predict a quantity describing the size of the population of different types of cells inside tumors.
I couldn't find a complete definition of the quantity in question. In particular, I am not sure about the following two points.
It seems that the predicted quantity should be a proportion (from the explanations given we can freely chose a constant scaling for the proportions for each cell type).
Should the proportion associated to a cell type be the number of RNA transcripts coming from this cell type divided by the total number of RNA?
(This is the RNA transcript number versus cell count as ground truth question. I think that for _in silico_ testing, the proportion of RNA transcripts is a better choice.)
If yes, are the proportions normalized by the amount of mRNA coming from all cells in the admixture including the
cancer cells or only from the cells whose infiltration is investigated?
Even if the proportions are evaluated without prescribing a scale,
the answers to both of those questions do sometimes make a difference.
Kind regards,
Jean-Marie
Created by Jean-Marie Droz jdroz Hi @brian.white ,
Thank you for the answers. It's good to know that you measure the proportions in this way. And indeed, I'd personally suggest for the *in vivo* challenge that you revise this and ask for the real cell proportions instead of mRNA proportions.
Best,
Julien Hi @jracle and all,
I'm writing this a lot today ... I'm sorry for the very slow response.
Yes, _in the validation phase of the in vitro challenge_, we are measuring relative mRNA content in the validation phase not relative cell proportions.
I'm qualifying my statement because, as I described above, the ground truth in some of the leaderboard datasets may be relative to cell proportions (e.g., if it was generated using cytof). Further, in the subsequent in vivo challenge (which we have not yet announced), we may revisit this decision and score relative to cells not mRNA.
To be explicit, we will generate data for the validation phase in both of the ways you describe:
1. in vitro admixtures: _lyse_ cells from purified samples, extract their mRNA, mix their _mRNA_ (don't mix cells), and perform RNA-seq
2. in silico admixtures: perform RNA-seq on purified samples and create in silico admixtures as the weighted sum of (normalized) expression of those purified samples.
Best,
Brian
Hi @brian.white ,
Thank you for the explanations about this question. However, I'm not sure that I fully understand the way the validation data was/is generated and this can be very important for the results.
1. From what you write, it seems that you know the proportion of mRNA coming from each cell type in a mixture sample.
* So does it mean that you measured separately the mRNA in isolated cell types and then created in silico mixture samples by simply summing together these various measurements at different proportions, possibly adding some noise in the mixes?
* Or did you kind of kill the cells to extract the mRNA from the various cell types, mix together these mRNA extracts at known proportions and then run some RNA-sequencing on these mixtures?
2. Because, before reading above's answer, I was always thinking that you would be mixing together known number of cells from the different cell types and then measuring the resulting mixture data with RNA-seq. Is it still done like that in the validation phase?
Indeed, this can make a big difference as mRNA contents could differ from one cell type to another. Personally, I would say that knowing the cell proportions is what makes more sense in an *in vivo* setting, not the mRNA proportions. That's true that if you are talking about the absolute number of cells, then it doesn't make a difference if you output/measure the mRNA amounts or cell amounts as you would have the same proportionality constant per cell type in each sample, but we are talking about proportions of cells (summing to 100 % in each sample separately). Thus measuring the mRNA proportions or cell proportions is really making a difference here and this can be a very important factor for the deconvolution, even if you are just correlating the relative proportion per cell types instead of making a single big correlation of all cell types together.
Can you thus confirm that you are really measuring the relative mRNA content in the validation phase, and not the relative cell proportions?
Thank you very much. Best wishes,
Julien
Hi @jdroz
In the validation round, the ground truth will be the proportion of a cell type's mRNA content relative to the total mRNA content in that sample -- i.e., across all cell types in that sample, including cancer cells, and not just those you are asked to score. However, your predictions will be scored independently for each cell type (and then combined into some aggregate cross-cell type score) using pearson and spearman correlation. Hence, your scores should be proportional to the mRNA proportions. _But_, the proportionality constant can differ across cell types. e.g., this is how MCP-counter outputs scores. B cells may have a different proportionality constant than CD8 T cells. That's OK, since we are comparing B cells across patients (same proportionality constant) but not B cells against CD8 T cells within the same patient (different constants).
Unfortunately, things may be slightly different for the leaderboard data (which we are not generating). Here, the data may be normalized be total # of cells, total amount of mRNA, total # of _lymphocytes_, or total amount of mRNA from lymphocytes. e.g., flow cytometry-based measures of ground truth are going to report numbers of cells, rather than mRNA content. These data are imperfect, but should give you a sense of how your method is performing -- particularly relative to other methods on the same data.
Brian
Drop files to upload
Definition of the quantity associated to a cell population page is loading…