You can take a look at this [thread](https://www.synapse.org/#!Synapse:syn28469146/discussion/threadId=9283) to see the previous discussion on this topic. Please look at the [rules](https://www.synapse.org/#!Synapse:syn28469146/wiki/617562) and ensure your submission follows them. To reiterate what I said before, our intent is to come up with the best model architectures and the best ways to train models on sequence-to-expression tasks that will benefit everyone in the community. The machine learning competitions are always won by ensembles (sometimes of 30/40/50 models). We wanted to avoid that from happening. That's why we are not allowing ensembles as a submission. We are aware that the participants could incorporate paths in their neural network architecture that can simulate an ensemble to some extent. But if you are coming up with such architectural choices and can train them end-to-end, it can be thought of as a novel architecture/strategy. Now moving to Rule 5, which states, "You can train your network in multiple steps (e.g., training some layers with other layers fixed) if you want." This allows you to fine-tune your layers if you want. But this rule may be misiterpreted to pass an ensemble as a submission. For example, you can have parallel paths in your architecture and train them one at a time (while keeping others fixed), then concatenate their feature space to make a final prediction or take an average of their predictions to make the final prediction. Though you can write the code to make it part of a single computational graph, if you are training the paths separately, it essentially becomes an ensemble. Therefore, to ensure that you are not unwillingly simulating an ensemble, you should train such parallel architectures end-to-end (mentioned in the [thread](https://www.synapse.org/#!Synapse:syn28469146/discussion/threadId=9283)). I think you are allowed to fine-tune the final layers that get to expression from combining the multipath information. But you should not fine-tune the parallel paths separately. Now another question can arise. Can you apply multiple losses to multiple branches in your architecture? It is a tricky question to answer. Cause it is getting closer to an ensemble than it would be if you were not applying separate losses to each branch. However, we have forced you to train end-to-end such architectures, and if you are using multiple branches that make sense on DNA data, now you want to apply multiple losses to make sure the branches are training better, we will not stop you. Please comment your thoughts on this matter.

Created by Abdul Muntakim Rafi muntakimrafi
@muntakimrafi good paper indeed) Thanks for the clarification.
@penzard We will use Torch.manual_seed(3407) [[paper](https://arxiv.org/abs/2109.08203)]. That was a joke. We will try to perform multiple runs given resource availability.
@muntakimrafi According to official Pytorch website (https://pytorch.org/docs/stable/notes/randomness.html) "Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms" We make all we can to create fully reproducible model (and it is reproducible given we use the same platform, same nvidia-card, CUDA build, pytorch version and etc). But small differences can still exist. What are you going to do in this case?
@muntakimrafi ?
@ivan.kulakovskiy please check the thread. I think I told you to submit your best model before. That is also fine as we will retrain the models and can discard parts from them. But it is better if you are submitting a single-loss variant along with your multiple-loss variant.
@danyeong-lee, I can see your point. If I use different architectures (ResNet, DenseNet) in parallel, it seems like exploiting the allowed rules to pass an ensemble as a submission. Why not add skip connections in your residual blocks? Also, where does it end if I start adding tens of networks in parallel? However, it makes sense if it is something like having conv filters with different lengths in different paths. I do not think we should discard that idea by calling it an ensemble. The vision models now start with convolutions and follow with attention layers. If somebody wants to try having Conv layers and attention layers in parallel and concatenate the feature spaces, it is also hard to discard that idea. But be assured that we will not blindly accept anything that is an ensemble in disguise (it is N complex branches stacked for no valid reason. It is not conv-attention in parallel or multi-length filters in parallel, etc.). To be honest, the idea of adding multiple losses for different branches makes me very uncomfortable. It seems okay if you combine the feature space from different paths. But if you are making actual expression prediciton from different paths, and on top of that, apply a transformation to combine the expression predictions from multiple paths, it seems your architecture seeks to decrease the variance of expression predictions from the multiple paths, and it feels less okay. However, it is very difficult for us to tell one team you cannot do this while we have no idea whether some other team is not doing something similar. Therefore, we tried to allow as much flexibility as possible within our "no ensembles" rule. We are going to retrain the models from scratch anyways. If something seems too wrong, not noble, and no other models are using this, we can discard that part while comparing it with other models. If you are using something that you think is simulating an ensemble, you can submit its non-ensemble variant (for example, if you are using multiple losses on different branches, submit the model without multiple losses too). I think I may sound like I contradict our rule of submitting a single model here. But the goal of a single model submission is to stop participants from submitting multiple models to try their luck with different models. If you have proper reasoning to back your multiple submissions, it is fine (e.g., in model A, we used multiple losses, so we submitted model B too, where we didn't use the multiple losses). Lastly, you need not worry that someone else will win by submitting models using loopholes in the rules. We would obviously consider having two leaderboards if it is needed.
@muntakimrafi I totally agree with @dohlee . As the person who asked this question via email, I was 99% sure that such models were not allowed. But without clarification, I was not sure enough and had to ask. I want to add that if we build a neural network the way I described in the letter - by combining branches (say, branches for different nucleotide windows or different lengths of motifs) not at the beginning of the neural network, but at the end - this is also similar to an ensemble, and I can think of hundreds of ways to exploit it to get an ensemble-like model. Especially if we allow for combined losses (but they are not necessary). If we think about the reason ensembles are prohibited - so must be prohibited such models. They provide no intuition about underlying biology or "how should nucleotide-sequence-based model be constructed", are as hard for interpretation as ensembles, and so on.
@muntakimrafi Hi. I first thank you for hosting this challenge with great data. My team is greatly enjoying the final days! Our concern is that in an extreme case, one can design a branched model where N branches stem from (i.e., share) a simple backbone layer (e.g., a single linear projection layer). If each of the N branches is trained with individual loss, it is (nearly) equivalent to ensemble, but the model still complies with the rules you have presented. We think that there should be a more clear criteria so that such case is prevented. I'm looking forward to your reply soon.
@danyeong-lee @muntakimrafi If this is not an ensemble, then I dont know the definition of an ensemble (at least in the competition)
@muntakimrafi If we design a model : fully connected layer - divide it into three branches, Transformer / CNN / RNN and train it through each loss, it looks like an ensemble. According to you, I think it would be okay to use this model, but is my understanding correct?
@muntakimrafi Thank you for the explicit clarification of the final scoring.
@ivan.kulakovskiy --You should submit the model you think performs the best while following the rules. If your branched model performs better than the single-path model, then you should submit the branched model. -- If you are using something that you think is simulating an ensemble, you can submit its non-ensemble variant (for example, if you are using multiple losses on different branches, submit the model without multiple losses too). I think I may sound like I contradict our rule of submitting a single model here. But the goal of a single model submission is to stop participants from submitting multiple models to try their luck with different models. If you have proper reasoning to back your multiple submissions, it is fine (e.g., in model A, we used multiple losses, so we submitted model B too, where we didn't use the multiple losses). No, we do not want to declare separate winners in each category. Ideally, we will want to declare the winner by taking the team with the least sum of ranks in ScoreSpearman and ScorePearson. If there is a tie in the sum of ranks, we will use the scores. However, if any team scores best on a metric but does not end up as one of the winning teams, they will receive honorable mentions.
@muntakimrafi I would hypothesize that a clearly defined and explicit procedure could be highly important for the community. E.g. if a team has alternative models (e.g. 'branching' and a 'non-branching') than it is quite unclear which should be submitted for the final evaluation. This is also related to the final metrics. Do you plan to announce a separate 'winner' in each category ('ScoreSpearman' and 'ScorePearson')? Or do you plan to average the scores/ranks of these two performance metrics somehow? I apologize for raising this question if this was already discussed/stated in the rules.
@mtinti Thanks for your advice. We will definitely keep that in mind during the final evaluation.
In my humble opinin, multiple brances with different losses are ensamble in disguise. For example, i didn't explore the idea to train at the same time a regressor and classification model becouse it felt wrong. I know that there is a thin line between saying this is an esable and this is not... howewer, if such models are allowed, I would like to see two different leaderboards implemented at the end , one for models that make such clever use of branches with multiple losses etc.. and one for model that do not.

Is my model an ensemble? page is loading…