Dear @MetadataAutomationChallengeParticipants , Thank you all for your incredible patience as we continue to work on and improve the scoring functions for this challenge! We would like to especially thank @JoshReed, @tconfrey, and @attila.egyedi, for their insightful feedback regarding some cases we missed when we developed the scoring algorithms. Below is a summary of the fixes we have implemented since the start of the challenge: **May Updates** * 0 should be returned when the goldstandard has >0 valueDomains but the submission has none * the number of mismatched rows between the goldstandard and the submission should take into account both the observedValue and value/conceptCode, not just thvalue/conceptCode * 0.5 should be returned when the value domains are correctly identified, even when the data element and data element concept are NOMATCH **April Updates** * returned scores should not be negative * deductions for predicting incorrect value domains should not go past 0.5 With each patch, please expect your submissions in the current round to be re-scored so that the scores are reflecting the new changes. And if you happen to come across more bugs, please let us know! (But hopefully, we have finally caught them all :) ) Thank you all for participating, and stay safe! Metadata Automation DREAM Challenge Organizers

Created by Verena Chung vchung
Hi @kmichels, In the Leaderboard phase, we have split each of the four datasets into two parts: one that we release publicly and one part that we used for intermediate evaluation of the submitted methods. Due to the relatively small amount of manually-annotated data available for this challenge and the above approach, one column that appears in the public part of a given dataset does not appear in the private part used for evaluation. This is a first reason why the algorithm submitted may have a different performance than when you evaluate it on public data. Also, have you used information from the public parts of the datasets to build your method? Let's say that the column named `age` (random example) is included in one of the released data. Did you use this piece of information to guide the development of your method? In this situation, your method would overfit the public parts of the datasets.
We definitely aren't hard-coding any values from the publicly available data. I am confused then. If we don't have access to the files, shouldn't the files on this side be a good representative of what's being used on your end to score them? If (on the github, https://github.com/Sage-Bionetworks/metadata-automation-challenge) the trial data sets, shouldn't it be representative of the testing data sets used when submitting? We have a couple different models and want to use scoring to figure out which ones are better, but we can't because the training on github is so different from when we submit it. Here are the scores when we ran the scoring code ourselves: sub_id |avg_weight|APOLLO|Outcome|REMBRANDT|ROI 9704128|2.2125 |1.3495 |2.1833 |2.0165 |3.3007 9704148|2.2800 |1.1509 |2.4760 |1.9703 |3.5232 9704150|2.0634 |1.4438 |2.3658 |1.2948 |3.1493 Below are the Round 3 submission scores for your team: sub_id |avg_weight|APOLLO|Outcome|REMBRANDT|ROI 9704128|1.4171 |0.9446 |1.0451 |0.9063 |3.6042 9704148|1.9264 |1.1802 |2.4616 |0.8646 |3.2500 9704150|1.8591 |1.6785 |2.2535 |0.5313 |2.6875 They clearly don't match, is there something you suggest to help us with this quandry?
Hi @kmichels, The input files used for the Leaderboard queues are different from what is publicly available on the Wiki. The differences in scores may be explained by how you and your team built the model. Are you using any hard-coded values from the publicly available data to make the predictions? Let us know!
So we got the scoring code from your website, cloned the repository last night: https://github.com/Sage-Bionetworks/metadata-automation-challenge The scores from running docker and following your instructions exactly: APOLLO-2: 1.3495174963925 Outcome-Predictors: 2.18333333333333 REMBRANDT: 2.01652380952381 ROI-Masks: 3.30072463768116 Scores from submitting: Hello kmichels, Your submission (third submission first try) is scored, below are your results: APOLLO-2_score: 0.9446 Outcome-Predictors_score: 1.0451 REMBRANDT_score: 0.9062 ROI-Masks_score: 3.6042 weighted_avg: 1.417 Sincerely, Challenge Administrator Why are they so different?
Thanks @v.chung for taking it up so quickly , Coinicidentally after observing the fix, I also re-submitted at the same time. Regards, Abhinav Jain
@abhinavjain014 , I have pushed the scoring fix and your 9704137 submission is currently re-running. Thank you again for the bug report! Verena
@abhinavjain014 , I was able to reproduce your error and found the bug! I'll be pushing out a fix shortly. Thank you for the notice, Verena
Thanks for your quick response @v.chung . My submission ID is - 9704137 Regards, Abhinav
Thank you, @abhinavjain014. Let me look into your submission and see where the error is stemming from. What is the submission ID?
Hi @v.chung , My submissions used to run smoothly, but after the changes are committed in scoring file, it is showing error in one of the files, (out of the 4 files) . Error is - The column `conceptCode` doesn't exist. I am not able to resolve it, however on checking the format with schema, it is showing JSON as valid. Please help me to resolve it. Thanks & Regards, Abhinav
**UPDATE 5/13/2020:** * incompatible type error (reported by @attila.egyedi) has been addressed; all fixes are now incorporated into the [main branch](https://github.com/Sage-Bionetworks/metadata-automation-challenge) * Round 3 submissions accepted prior to the scoring fixes are currently being re-run
@attila.egyedi, Thank you! I think I know where that error stems from; I'll try to reproduce the error and implement a fix soon. Thanks! Verena
Hi @v.chung , I was testing the scoring and I observed a small change compared to the previous behavior. Previously having numeric values in the "observedValue" as numbers (without quotes) was ok. Now the scoring runs into an error in such cases: ``` Error: not compatible: Incompatible type for column `observedValue`: x numeric, y character ``` I am not saying that this is wrong, we can easily wrap all the numbers in quotes, I just thought it's worth mentioning. Attila
Hi @abhinavjain014, Round 3 has been extended to next Friday, May 15th, 5pm Pacific Time. An email was sent out earlier this week regarding this extension, so our apologies if you did not receive it! Best, Verena
Hi @v.chung , Since there are changes introduced in past few days, So is there any extension in Round 3 deadline ? We are supposed to make the submission by May 8 EOD ? Thanks for keep us updating with the changes.
@attila.egyedi , The newest patch is awaiting some spot checks from a fellow developer, but in the meantime, please feel free to use the `score_fix` branch if you would like to locally validate & score! Thank you again for the bug reports! Verena
Thank you, @v.chung , for the update! Let us know when we can pull the code from the master branch. Regards, Attila

Scoring Bug Fix Log page is loading…