I'm trying to do multi-task training with convolutional neural networks, i.e. build one model to predict for all TFs simultaneously, however I'm getting much worse results than training one model per transcription factor. Basically I have sequence and dnase as inputs and then have seperate output units per transcription factor where losses are only calculated on the transcription factors for which I have training data. The idea is that the model finds a common representation for sequence and dnase and the output for a particular tf is a convex combination of these features. Has anyone else tried multi-task training?
Created by Stackd What I found is that even shallow nets are overfitting on this task, deeper networks don't seem to be improving performance significantly. I'm using dropout for regularisation. That seems reasonable. I'm viewing U/A/B as an ordinal response rather than softmax but haven't assessed if that's actually helping. I would imagine a multiclass net like this would need to be much larger (e.g. more channels/units per hidden node) than a single TF net: is that the case for you? To be precise: I'm using a seperate softmax for each tf, so each tf will have three outputs (32 * 3 outputs in total): -1 (unknown), 0 (unbound), 1 (bound). When computing the loss, I weigh the loss for the unknown labels with 0. Stackd: what likelihood are you using for the multi response? As yuanfang says it wouldn't really make sense to use softmax multiclass, you really want each TF to have a separate logistic/binomial likelihood. >So in theory, performance should be quite similar to taking known TF motifs, scoring them along the sequence and using XGB on those scores. But please correct me if this is wrong.
i think so, if one knows how to tune it. the problem is nobody knows how to tune it. that's a piece of black magic.
(i think the organizers can help on this, they have more experience on cnn, maybe give a baseline code on a cnn-based model) Hi Yuanfang,
Thanks for your response. So, the reason I used a convolutional neural network is that in essence, each convolutional filter behaves like a PWM, i.e. each filters maps from a local part of the sequence to a real number. So the filter scans along the sequence for a particular sequence feature. On the higher levels the hidden units will represent convex combinations of the output of those filters. So in theory, performance should be quite similar to taking known TF motifs, scoring them along the sequence and using XGB on those scores. But please correct me if this is wrong.
As for the output, I treat each output unit as individual, and have per-tf binary cross entropy loss. So for each transcription factor the output will be in [0,1] (after applying sigmoid).
The biggest problem I encountered with the neural networks is that they are very prone to overfitting on this problem.
Niels hi stackd,
my two cents: i think neural network is a wrong direction on this problem. for this problem it is an over-kill
i did quite crappy in phase 1, but i think my big direction is correct, of course i am not as good as the top teams, as i don't know how to represent motifs with these motif finding tools. i intend to share my code, less than 200 lines in total (method is already open, https://www.synapse.org/#!Synapse:syn7426931/wiki/407892) i think it is fine, since i am not going it to use it any way in the final phase, even if i decide to enter.
i can answer your question why your multiple output in neural network doesn't work: if you use e.g. vgg/googlenet one of these derivatives , and softmax as penalty, then it gives the most likely tf, the second ranked one ideally would give a score of zero, if the network is doing some proper work. because it is not A OR B. of course, your network must be much shallow (shallow-learning?), since there isn't even enough features
then for single output network, it is just doing xgboost. you can say i also used neural network, except my network is one layer, and two features.
i personally think xgboost and Vowpal Wabbit are the only two choices of this problem (but obviously i am not an expert as can clearly been seen by my crappy performance, maybe the top teams want to share some insight).
yuanfang