For NRMSE, there are 4 ways to normalize RMSE, including: 1. RMSE/mean(y) 2. RMSE/sd(y) 3. RMSE/(y_max-y_min) 4. RMSE/(q1-q3) I am uncertain as to which criteria this challenge would use to acquire the NRMSE. In addition, is it accurate that SC and NRMSE are computed by first transforming the count matrix to a vector?

Created by Weiru Han palantirH
Hi @rchai, Thanks for your update. I will continue my work now.
Hi @palantirH, FYI, we have made the improvement on logging the errors and the error for overloaded memory usage now is also captured, along with the error for runtime limit. We are also returning the tree structure of the output folder in your log folder (thanks for the suggestion). I hope it can help you better understand the errors and ease your debugging process. Thank you.
Hi @rchai, Thanks for your assistance! I think now I can proceed and optimize my work again.
Hi @palantirH, > We do not have permission to download the output file. It would be really useful to be able to view which files are saved in the output folder after training. In fact, it's expected that participants do NOT have the permission to download the output files, due to security reasons. In case you need, here is your output folder file structure of the submission '9730898': ``` output/ ?? ds1c_p20k_n1_output/ ?? ds1c_p50k_n2_output/ ?? ds1c_p100k_n1_output/ ?? ds1c_p50k_n2_imputed.csv ?? ds1c_p100k_n1_imputed.csv ?? newds1c_p20k_n1.csv ?? newds1c_p50k_n2.csv ?? newds1c_p100k_n1.csv ``` You are always free to log the output files in the Docker, e.g. `print(list.files("/output", pattern = "*_imputed.csv"))`. Sorry for the confusion and we will remove the "Output Files" Column from the Submission Dashboard to reduce the confusion. Thanks!
Hi @rchai, Thanks for the response! Yes I also surmise it might be the memory issue. Thank you for helping me to resolve it! Additionally, we do not have permission to download the output file. It would be really useful to be able to view which files are saved in the output folder after training. Thanks!
Hi @palantirH, > Could you please help me in determining if the incorrect name of prediction files is the cause? I can see your output files are named correctly and it seems that your model reached the memory limit (160 GiB ). ~~You can find the tarball of your output files is [here](https://www.synapse.org/#!Synapse:syn50908957) , which can be found via [Submission Dashboard](https://www.synapse.org/#!Synapse:syn26720920/wiki/620126) > "Output Files" Column~~. I notice you are using `scImpute` package. Based on our testings, the models with default `scImpute` will most likely run overtime with the limited memory usage. There might be a better way to construct the model or parallelization to make it work, but below is the estimated runtime results of our dry-runs using `scImpute` and hope it helps. | Phase | Model | Runtime | | Leaderboard | scImpute | 12h | | Final | scImpute | More than one week |
Hello @rchai, In the experiment (submission id: 9730898), there is an error indicating that 25 prediction files could not be found. Could you please help me in determining if the incorrect name of prediction files is the cause? (files/folders with "new[input_name]", "[input_name]_output" are intermediate output), and I can affirm that my naming is "[input_name]\_imputed.csv". Thanks!
Hi @rchai , I see that. Thanks!
@palantirH The submission 9730810 has been stopped.
Hi @rchai, The submission id is 9730810, Thanks!
Hi @palantirH, To make sure we stop the correct submission, can you please provide the submission Id that you would like to stop? Thank you.
Hello @rchai , Thanks again for your assistance! Another image I provided has been analyzed for more than seven hours. Could you please help me stop running so that I can submit a new one? Thanks!
Hi @palantirH, > Aside than that, I was unable to view the leaderboard; could you make these accessible to all participants so they can monitor the progress of this competition? Thanks for reporting. The issue has been fixed - sorry for any inconvenience. > The prediction file status is valid, but the workflow status is ERROR; Submission Status is INVALID. The source code I used for the test model was taken from the example you supplied on github, thus the run model, Dockerfile, and imputation model files are identical. It might be due to the service issue. I see you have successfully re-submitted the same model, but feel free to let us know if you still have any questions.
Hi @rchai , Thanks. The prediction file status is valid, but the workflow status is ERROR; Submission Status is INVALID. The source code I used for the test model was taken from the example you supplied on github, thus the run model, Dockerfile, and imputation model files are identical. It's confusing to me. Aside than that, I was unable to view the leaderboard; could you make these accessible to all participants so they can monitor the progress of this competition? Thanks!
Hi @palantirH, Thanks for the submissions! The service has been rebooted and you should be able to get the results of your submissions soon. Please feel free to let me know if you have any question on the submission.
Hello @rchai and @ncalistri, Thanks! I guess I'm now clear about this challenge. By the way, I submitted one test model to the leaderboard. The status indicates that the model has been received but is not currently executing. Could you figure it out? The submission id is 9730290
Hi @palantirH, Here is the quote from one of organizers, @ncalistri: > If I'm understanding correctly then yes, imputed vector consists of true positive, false positive, false negative (and true negative if imputed vector has a zero that was also zero in ground truth) Hope it can answer your question. Thank you!
Hello @rchai , Thanks for your reply and clarification! I also have one query regarding imputed values; please correct me if I'm wrong. Say I denote true positive to the circumstance where missing value is correctly imputed, and false positive to the scenario where value that doesn't need to be imputed is mistakenly imputed. Is it true that imputed vectors consist of **true positive, false positive and false negative**? Thanks!
Hi @palantirH, That's a good question. The RMSE is normalized by range which is the third listed method. We first calculate the NRMSE/SC (spearman correlation) for each gene between ground truth and imputed values (vectors). Then we take the mean NRMSE/SC of all genes as the overall scores. I hope it answers your question. Thank you!

What are the correct ways to calculate NRMSE and SC? page is loading…