In this challenge one has to provide values of all features for each recordID, as it does not accept "empty" values. In reality their are number of columns for which their is no data file corresponding to number of recordIDs. In that case one can not generate certain feature corresponding to a given recordID. What value participants should provide for feature corresponding to recordID if computation of feature is not possible due to lack of data. In my view best way is to put NA (Not Applicable) in columns/cells for which data is not available. In order to satisfy conditions, we are putting zero, in case value is empety. I do not know about other participants but We are not comfortable with format of submission. If organisers are looking for imputing missing values than organisers should clearly spell it. I believe, aim of this competition is to find out best features which can discriminate disease and control individuals with high accuracy. Participant should not spent time on guessing values to be filled in empty cells. It is possible, this question had been already discussed in previous threads, in that case please provide me link of thread.
Created by Gajendra Raghava raghava Thanks for response, I understand limitations. I am not asking to change rules for this subchallenge1. I simply give my view with hope that it may help organisers to make challenge more friendly.
@raghava-
It wouldn't be fair to participants who submitted on time to change the scoring rules after the deadline. We certainly plan to explore the effect of missing data segments in the community phase, launching in November, if you want to make an attempt at submitting to this one.
However, since the deadline for this subchallenge has passed, another option is turning your attention to the other 3 subchallenges (whose deadlines are at the end of the month), if you're uncomfortable with this one. There is no requirement that you participate in all 4, and those subchallenges do not suffer the same issue of missing data.
Solly