Dear organizers, It seems that for some samples of the leaderboard test set the features that ought to be Ensembl genes are something else. My log indicates that a set of 19525 features has an intersection of size only 739 with the 23444 Ensembl genes that I am using. For some other data sets, the intersection has a reasonable size. Let me reiterate something I asked in another thread: Could we have the a list of the exact available feature for the different micro array and for your version of a Gh38 reference (which will be used in the validation phase)? I'd like to have it as Ensembl genes, but other participants might want other formats. Many machine learning techniques do not deal well with missing data. Finally two questions about debugging: The restriction on the log size seems far too strong, except if you are checking them manually for cheating, but even then, readability is more important that terseness. Could you loosen it to a few megabytes? Is it allowed to provide data set names in the logs? If yes, what about sample names? I would like to be able to correlate data set results with scaling summarization method to ensembl feature and other properties. Is that fine? Thank you! Kind regards, Jean-Marie

Created by Jean-Marie Droz jdroz
Yes that one does have 55 samples. We are fixing it at the moment. -Andrew
Hi Andrew, I do not know, I didn't log the data set names. But I can find out that 55 samples have the same problem. So it might be a data set with 55 samples. I hope that it helps.
Hi @jdroz Do you know which dataset(s) are the issue? Is it possible it's the same issue as [here](https://www.synapse.org/#!Synapse:syn15589870/discussion/threadId=5867)? I will have to back to you on the other questions.

Possibly wrong features in coarse grained leaderboard test set page is loading…