Dear organizers,
Since there is a deadline in February for the within-cell type benchmark round and it is approaching day by day, I just wonder when we will have the access to the released held-out ChiP-seq dataset to complete the task for this round?
Thanks
Created by willsong819 The submission queue for the within-cell type benchmark round is now open
Queue name: ENCODE-DREAM Transcription Factor Binding Site Prediction Challenge (WITHIN CELL TYPE)
Queue ID: 8078026
**NOTE:** You can submit as many times as you like. You will only receive a validation status feedback. No performance feedback. Your final submission before the deadline will be used to score.
Please note this submission must be made by all teams. It does not count to winning the challenge but is an important piece for conclusions relevant to the challenge. So please do submit unless you have a very good reason why this is not feasible (e.g. the design of your algorithm itself).
Thanks,
Anshul. @akundaje Nevermind! I think this problem was automatically solved by your guys' reuploading the files! Previously there were some mistakes and now they are fixed!
@willsong819 looking into the potential label swap. How did you infer the swap? The line endings should all be changed to unix-style now. Thanks for catching this!
-Akshay Hi organizers,
I noticed that there are some mistakes in the labels files. The HNF4A label file is actually the FOXA1 label file, and the REST label file is the NANOG label file. Please remember to correct the label files for HNF4A and REST, thanks! I just realized that I can't replicate some of my model training under this new paradigm. Some of my previous models used methods that required at least three reference cell lines to train, but if we're training and predicting within the same cell line, which I guess means we're only training on one cell line and none of the reference cell line data, I cannot use these methods. I'll try my best to replicate as much of the training as possible. Thanks for catching this. We'll fix and reupload today.
Anshul I realized that the line endings of the Label TSV files (I did not check the peak files) are now DOS based. Previously, unix file endings were used. It might be good to change that to allow the files to be easily processed with the existing pipelines.
We have just released the datasets for within cell-type benchmarking round on Synpase. Briefly, in this round, we have released labels for all chromosomes except chr1, chr21 and chr8 for the TF/celltype samples that were hidden for the across cell-type rounds.
You must now train models on the training chromosomes for each TF/celltype and predict on chr1,chr21 and chr8 in the SAME cell type.
Below is a description of this round followed by locations of data and submission file format.
The queue for this round will be activated some time next week. You will receive an email with details.
The deadline for all submissions for the benchmarking round is Feb 17th, 2017.
**DESCRIPTION OF WITHIN-CELL TYPE BENCHMARKING ROUND (See https://www.synapse.org/#!Synapse:syn6131484/wiki/402032)**
All teams that wish to be scored and ranked through the Challenge must submit predictions for a Post-Challenge Benchmarking Round that will focus on a simpler problem of predicting TF binding in a cell type on all three held-out chromosomes (chr1, chr21 and chr8) when models are trained on binding data in the training chromosomes in the same cell type i.e. test cell types will be available for training, but training and test chromosomes are different. This formulation of the prediction task is commonly used in many existing publications on TF binding prediction. The main purpose of this Round is to compare and benchmark the within-cell type prediction performance against across-cell type prediction performance of various methods.
Given the structure of the challenge, within-cell type predictions cannot be made on the held-out cell types without releasing some part of the TF binding locations in the true-blind held-out cell types. Hence, after the across-cell type Final Submission Round submissions are made, we will release binding locations of each TF in the held-out cell types across all chromosomes except the held-out chromosomes chr1, chr21 and chr8. Teams will be asked to train models using this data and predict binding locations on chr1, chr21 and chr8 in the same cell types. This Round is for benchmarking purposes only. The results of this Round will not be used to rank participants for the primary challenge. See Section 2.3 for more information on held-out cell types for various TFs. See Section 3.4 for submission instructions and formats for the Benchmarking Round.
**TRAINING LABELS FOR BENCHMARKING ROUND **
The label datasets have been added to the Accessing Data Section 3.3 of the wiki https://www.synapse.org/#!Synapse:syn6131484/wiki/402043
In the Files section of the Synapse wiki we have created a new directory in 'Challenge Data' called 'within_cell' https://www.synapse.org/#!Synapse:syn8077510
You will see 3 subdirectories in there representing the label TSV files, conservative peaks and relaxed peaks in the same formats as before.
We also provide Tar archives for all files (URLs below)
Label TSV files: https://www.synapse.org/#!Synapse:syn8077822
Conservative peaks: https://www.synapse.org/#!Synapse:syn8077715
Relaxed peaks: https://www.synapse.org/#!Synapse:syn8077732
**SUBMISSION FORMAT**
Once again please use the correct submission template for the benchmarking round provided here https://www.synapse.org/#!Synapse:syn6131484/wiki/402044 for submitting all predictions.
If you have any questions, please post on the discussion board. You are correct. Those are indeed the benchmark round datasets. I am about to send an email with the official announcement in an hour.
Anshul Dear organizers,
I found a directory "within_cell" (syn8077510) in "Challenge Data" under the Files section.
Are these the data for the benchmarking phase?
As there hasn't been an official announcement: Can we already download those files and prepare our submission to the benchmarking phase? Or are there still updates of the data to come, and we should wait for those?
Best,
Jan We are finishing the Round2 scoring this week. We will release the datasets required for the within-cell type predictions and open up a queue to submit the predictions next week. The deadline for submission of within cell type predictions will be extended to Feb 15th.
-Anshul.