Hello all!
As we discussed a few weeks ago, the organizers have decided to make the Avocado model and predictions available to all the participants. You can find the trained models themselves (one for each chromosome), as well as the imputed training and validation tracks, in the file baselines/avocado. The simplest use of these predictions would be a boosting approach, when your model attempts to learn the difference between the Avocado prediction and the truth using some extra information, or some form of clever modeling. These models are trained using the training data, and then subsequently used to make predictions for both the training tracks (in the training folder), and the validation tracks (in the validation folder). Note that there is a separate model trained for each chromosome, meaning that the genomic latent factors are not comparable across models. This differs from the training approach we took in our paper, but shouldn't effect the imputations.
We will soon be releasing a second set of models that are trained on both the training and the validation tracks. These models should be used ONLY for making predictions on the held out test set at the end of the round, and NOT for making predictions on the validation set, which is part of the training set for these models.
Hopefully this will help even the playing field between people who have massive compute at their disposal (and could simply recreate the Avocado model) and those who don't.
Let me know if you have any questions!
Created by Jacob Schreiber jmschr I found it is sinh transform, everyting is fine... Hi,
I just want to check, what is the uploaded bigwig file for baseline--avocado--training/validation, as described, it is the imputed training and validation tracks. But I load the avocado model, do predict on the validation sets, which result is different from, the NPY files, use bw_to_npy.py(form https://github.com/ENCODE-DCC/imputation_challenge) from these bigwig files.
E.g. C03M02 chr17(no black list), mean_squared_error is 85.0130859603117, which makes me confused...
Thanks! We will not be releasing the experimental data for the held out test sets (which are the blind sets for the final score) until after the deadline. Next week we will be releasing Avocado's *imputed versions* of those experiments. These imputations will be from a model that has only seen the training and the validation set, not the test set. Releasing these imputed tracks will be necessary for people who are building models that boost on Avocado's imputations (or used the trained model) to make predictions on the test set. The currently released imputations are from a model that has only seen the training set. Are those "held out test sets" those blind sets for final score? If released before the dateline, people can easily use these data to train existing models to gain improvements. We will be releasing an Avocado model trained on both the training and the validation set, as well as that model's predictions on the held out test set, next week. We did not release these earlier because we did not want people to mistakenly build models that had seen the validation set and then evaluate them on the same validation set. It is probably not a good idea to change the condition short before the end of the contest. Hello! Will you be releasing the Avocado imputations for the blind set? Or perhaps Avocado trained on training and validation (with some readme, if possible on how) to perform the imputation on the blind? Sorry about that. We will re-upload validation data sets today. Hi Jacob, thanks for the response.
I can download the training prediction files, but can not download the [validation predictions](https://www.synapse.org/#!Synapse:syn19363669). I guess the folder setting needs to be modified? You can find the Avocado predictions for the training tracks here: https://www.synapse.org/#!Synapse:syn19964989. The predictions for the validation tracks, from the same model, can be found here: https://www.synapse.org/#!Synapse:syn19363669 To be clear, these predictions are all from a single model that was trained on the training set. seems like we do not have access to download the avocado prediction files.
I got error "You don't have permission to access /kundaje/leepc12/avocado/C02M22.p0.bigwig"
Thanks! Thanks I used 2000 epochs to train the model. The exact training script `avocado_fit.py` (except for file locations, which have been trimmed) has been provided in the baselines/avocado file. Note that the data tracks are arcsinh transformed prior to training, and that the prediction from the model are subsequently sinh transformed to go back to the original space. Hi,
How many epochs did you use to train the models?
Thanks,
Chenyang
Drop files to upload
Avocado model and bigwigs released pt.1 page is loading…