Hi everyone,
I joined this challenge too late, but since I was very interested I thought to give it a try. So I selected CTCF as it seems to be an easy case, and I trained my model using only the sequence. I trained and tested it on the same cell type. Even though it is optimistic still it should give me a good sense of what range of numbers I should expect. Under this circumstance my model looks pretty accurate. Please find a sample prediction of my model here: [Pred](http://www.filedropper.com/pred) . The surprise however is that once submitted I received almost 0 accuracy. I highly doubt that the difference should be that high. What am I missing? Any thoughts?
Created by H H BioChallenger There are ~60 million intervals. Among them 180 thousands bind CTCF, the rest - not.
In leader board you need make a prediction for ~9 million of intervals.
A good model should predict 98%-99% of all true negatives (not bind, U), and 49-50% of true positive (do bind, B).
Does your model so good?