Dear challenge participants,
As we have started receiving the write-ups and analysis code for Sub-challenge 2, we wanted to stress the following points that are needed so that we can benchmark the algorithms, and that your algorithms are easier to use by the community:
1) Please review the submission page https://www.synapse.org/#!Synapse:syn18380862/wiki/590818
for the three components that must accompany the narrative description of your approach (analysis script, info how to run it, and current prediction file you get on your system)
2) It is essential that the analysis script takes as input a single expression matrix (rows are features, columns SampleID) and single sample annotation table (rows are samples, columns are SampleID IndividualID, GA, Group, Train, Platform, GADel).
3) Make sure the script uses at most 100 predictors on train sample data (Train=1) to make predictions on the test samples (Train=0) and writes a properly formatted prediction file
4) Should you want to do platform specific preprocessing it should be an internal step making use of the column Platform in the annotation file. The script should work regardless if there is just one or two platforms involved in the training set.
We value your effort and look forward to your submissions,
The organizers
Created by Adi Tarca bcbuprb Hi @vladimir.kovacevic,
You are right that perhaps different ways of merging the two datasets was part of the method development items that teams had to consider in the previous phase of this sub-challenge.
In the current phase, the algorithm submitted by teams will be tested on same set of several input files, some of which involves 2 platforms some a single platform. So the focus is is not on how to merge the data but how to get the up to 100 predictor genes and make the predictions given the same input dataset. If we know what output you get when using our merged esetSC2 dataset, that will help us figure what ,if any, differences due to software versions and platforms we have relative to what output you got on the same input . Having scripts take as input one single file is less work for us. For those who started with the separate files we had adapt their code to start with the merged dataset.
thank you for your interest. @bcbuprb, thanks for the prompt response. One more thing, If we merge manually HTA20_RMA.csv and eset_HuGene21ST.csv it could be different than your way of merging. That can produce inconsistency in the output predictions (one that we provided and one that you've obtained from our the code with your merged input) since our input and your input differ.
It seems to me that it is more unambiguous to provide both HTA20_RMA.csv and eset_HuGene21ST.csv at input of the script, what do you think? Regarding 1, in the end you still have to combine the two datasets to get something like esetSC2.csv (see Files folder) ., so you can have the script start directly with the combined expression matrix and sample annotation data frame (anoSC2_v20_nokey.csv).
Regarding the second question, it would be easier for us and others that may want to use your work, to have the python script (.py).
@bcbuprb ,
regarding 1. You mentioned a single expression matrix but in the subchallenge files, 2 are suggested to be used: HTA20_RMA.csv and eset_HuGene21ST.csv. How should ve handle this?
Second question, is it OK to submit Jupiter inotebook (.ipynb) instead of .py script?