Hello, I am confused about how submissions are evaluated and what the ground truth is. (1) The way I understand it, ground truth is what you call 'actual' position, which is the result of computationally mapping the 1297 query cells to the 3039 target cells. The mapping is done using binarized digital expression (Drop-seq) for the query and binarized in situ expression (from BDTNP) for the target. For a given query cell, the 'actual' position would be the one maximizing Matthews Correlation Coefficient (MCC) given the two binarized expression vectors for all 84 driver genes. If that is correct, the following R code should calculate the 'actual' positions ```r dge.driver.bin <- as.matrix(read.csv(file='ALL_files/dge_binarized_distMap.csv')) insitu.bin <- t(as.matrix(read.csv(file='ALL_files/binarized_bdtnp.csv', check.names = FALSE))) library('mccr') actual.pos <- rep(NA, ncol(dge.driver.bin)) for (query.cell in 1:ncol(dge.driver.bin)) { mcc <- sapply(1:ncol(insitu.bin), function(i) mccr(dge.driver.bin[, query.cell], insitu.bin[, i])) actual.pos[query.cell] <- which.max(mcc) } ``` (2) Do I understand correctly that the goal is now to predict the target positions using fewer driver genes from the in situ data? To minimize uncertainty regarding the evaluation, could you provide (pseudo) code of the procedure, i.e. a function that takes the three submissions plus challenge data and outputs the score? Thanks, Christoph

Created by Christoph Hafemeister Christoph.H
Hi It is difficult to say because the genes you chose from the in situs will also be taken into account. Also remember that Gene expression patterns will also be used for scoring. However if we don't take into account genes, it seems to me that option 2 will have a higher score. thanks
So if a cell had 4 possibilities for locations (i.e. 4 equal max MCCs in DistMap) then a team submitting 4 of them exact location for that cell + 6 totally wrong locations (i.e. totally off) will get a higher score than a team submitting 3 exact locations + 7 near-by locations (closed to the 1'st, 2'nd, 3'rd and 4'th point)?
Well you should definitely try to get all 10 locations as good as possible that should be enough to get a good score, I cannot say much more than that without revealing the scoring scheme. There is no right or wrong there is a distance of your predictions to the actual locations.
Hi Pablo, Thank you for your reply. I guess my question is "what "the most likely" means in this context". Should we optimize to get 1 (or whatever the number of maximal MCC) correct location and no need to concern about the rest of the locations, or optimize for getting average distance of all 10 locations to be closest to true location? If we don't have that 1 exact location as the true location in our 10 locations, meaning we get all 10 wrong, how do you calculate the distance? Averaging the distance of all 10 locations or only the distance of the closest to the true location? My concern is how I should interpret the performance of my model when I get that all 10 locations wrong.
Hi, we did not say that the scoring will "average all 10 of the predict location" or at least we don't meant that. So the problem you point out regarding the other 9 locations lowering the result won't happen. I am not sure I understand when you say "Now, I am confused if the goal of this challenge is to predict the 10 most likely location using MCC location as the "gold standard", or to predict 10 closest to "truth" locations." You have to predict the 10 most likely locations given that the true location is given by the maximal MCC.
Hi Pablo, I would like to follow up with Tin's question regarding how will the distance be calculated between the predicted 10 locations and the MCC location(s). Now my question is that, given the scoring method average all 10 of the predict location as mentioned during the webinar, how do you see if one of the predict locations is exactly the same as the MCC location? If I successfully predicted the MCC location on my 10 predict location, then averaging with the rest 9 locations will just worsen the result. Now, I am confused if the goal of this challenge is to predict the 10 most likely location using MCC location as the "gold standard", or to predict 10 closest to "truth" locations.
Yes
Hi, just one quick question. If a cell has multiple max mcc positions, are we allowed to use all those positions?
Thanks for confirming!
Hi Wan, what you are saying is correct!
Ok, thanks for clarifying. Please feel to vent all your questions and I understand your concerns, but is is hard to answer them without revealing parts of the scoring. First metric is somehow based on euclidean distances, but not only that. Agree about the combination of things that might not match, but we have a solution, as I said. thanks for the interest
Hi Pablo, In the first question, I asked about the formula for euclidean distance, which is open and described clearly in the 1hour seminar video that the distance is the average of all euclidean distances (d_i=1/n(sum(d_l))). You can find that formula in one of the slides. The euclidean formula was intended to be open for public in the first place, but I have an impression that one presenter was not aware of multiple mappings (according to questions/answers at the end of the seminar). Is it true that the euclidean formula is open for unique mapping, but is a secret for multiple mapping? Please let us know. Regarding the second question, it is very clear from the seminar video (at 00:30:30s) that the score is an average of the first and second metrics. I just do not understand how can one weighs or averages those two scores if the second score is not available (i.e., one does not use any in situ gene). However, your answer is satisfying since it indicates that you have a way to deal with this situation. Thanks.
Hi Pablo, I am still a little bit confused. So what is the position we can use to train our data? From what you said, my understanding is, if there are 1 position for the highest score, then if we predicted that particular location in our 10 locations submission, we are good; if there are more than 1 position with highest score, we should have all of them on our 10 locations submission. Can you clarify that please? Thanks!
Hi, are you trying to get me into revealing details of scoring? In any case I cannot really answer 1/ and we have thought about 2/ So don't worry, you can also use no-genes or fewer if you want P
Thank you for the quick reply Pablo! I have two follow-up questions: 1/ One valid scenario is that we have 8 max MCC positions for a cell and the submission file includes 10 positions (for the same cell). Is it true that you are going to calculate all 80 euclidean distances (10 predicted positions versus 8 true positions) and average them for that one cell? 2/ In case one does not use any in situ information (0 genes). How will that submission be compared with another submission that uses 20 in situ genes given that the first scoring metric is tied (same euclidean distances)?
If this is the case then they all will be scored equally P
I have the same question as Tin. My understanding of the paper is that, they are aware that there are a few highest score positions and that's why they didn't have/trying to find the one best location. In that case, what is the position our prediction result are comparing with? The top 10 locations of each cell by the MCC score?
Hi Christoph and Paolo, thank you for the code and explanation. The mcc values calculated using mccr are identical to dm@mcc.scores from DistMap after exponentiation. However, the code above assumes that each cell has a unique max MCC position, which is not true. There is one question unanswered: what happens when a cell can be mapped to multiple bins using max MCC? Paolo -- there are currently 287 cells with non-unique mapping, i.e., a cell has several bins with the same max MCC values. For example, cell number 3 has max MCC at 2 bins 1705 and 2124. Which bin is considered the ground truth and is used in your scoring? What bins and coordinates are considered ground truth for the above-mentioned 287 cells? Below is a small code block for your consideration. ``` mcc <- dm@mcc.scores[,3] > which(mcc==max(mcc)) [1] 1705 2124 ```
Well there is as of today not a technique that allows you to have cell position and RNAseq... so the data is the data. The paper has proven that their method can uniquely find positions for cells using the 84 in situs, which is a big achievement. Now we assume that the position is correct, but the question becomes if you can do the same job but with less in situ genes as reference? We think methods that would do as well as the paper but use less in situ information will be very revealing of the biology underlying the process. thanks P
.
Hello Pablo, If I did not misunderstand the formula, I think the evaluation is kind of "biased". By using the golden standard( MCC) to evaluate our prediction accuracy, it implies that MCC should be able to predict the true location of all cells with little or no error. If MCC cannot do so, we are basically not predicting the true location of cells, we are trying to reproduce the result of the fruit fly paper using fewer driver genes. Even though the result of the paper looks appealing, but the method is not supposed to be error-free. Therefore, if the paper is the "golden standard", the best performance we can get is the performance of the algorithm showing in the paper. It's like an old saying in machine learning "Garbages in and garbages out", in our case it is "The same thing in the same thing out". We will not be able to verify if our methods or the author's method can work well in a real study. Considering the cost of our challenge, I totally understand why the paper becomes the golden standard, but the evaluation part could be perfect if we can use a real dataset that has the cells gene and cell's true location to evaluate the performance of our models. Best, Jeff
Hi, Pablo, this is my understanding from the answers and --- your clarification will be deeply appreciated: While we can calculate the gold standard as described in (1), we are expected to print something else in our submissions? i think it is more appropriate that the gold standard is calculated from some other hidden genes other than the released 84. thanks for your help
Pablo, thank you for the quick reply. I hope that scoring will not remain a black box for long. While the evaluation metric is not disclosed, how are participants expected to develop methods maximizing that score?
Hi Christoph (1) is correct. (2) Indeed we want you to use less genes (actually the idea is going towards a solution that uses no in situ genes at all). At this time we are not revealing the details of the scoring but it will be based on the position as calculated by the MCC as you describe in (1) but we will also add weights depending which genes you use and maybe other metrics. Hi Robert yes the restriction on the 84 genes applies only to the in situs
I was wondering the same thing. Piggy-backing for an additional question related to (2) (answer might be obvious, but just to be sure): Does the 60/40/20 gene subset restriction only apply to the in-situ expression dataset? I.e. is the _scRNAseq_ data for all 8924 genes usable at any stage, in addition to the 60/40/20 in-situ?

question regarding evaluation and ground truth page is loading…