Dear @scRNAseqandscATACseqDataAnalysisDREAMChallengeOrganizers I registered for the challenge, but I can't understand how the ground truth data forms and what it looks like. Can you explain what the groud truth data is? Thank you in advance.

Created by ?? ? eunseo
Hi @rchai , I regret to inform you that my submission 9730551 has failed with the following reason: 17 file(s) not found : 'ds1c_p10k_n2_imputed.csv', 'ds1c_p10k_n3_imputed.csv', 'ds1c_p20k_n1_imputed... For this part, I am following the example in https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/tree/main/task1/py-deepimpute And my docker file is: FROM ubuntu:20.04 RUN apt-get update -y RUN apt-get install software-properties-common -y RUN apt-get install python3 -y RUN apt-get install python3-pip -y RUN pip3 install torch RUN pip3 install scanpy RUN pip3 install numpy==1.22 COPY src/* ./ ENTRYPOINT ["python3", "/run_model.py", "-i", "/input", "-o", "/output"] I can't pin point the reason, could you please support me on this matter? Many thanks
Hi @rchai , Thanks for getting back to me, really appreciated.
Hi @ialsag01, Below is the traceback error returned from your submission (9730547) and you can also find the logs [here](https://www.synapse.org/#!Synapse:syn50694652). It looks like the submission failed fitting the model. Have you tried to run the model locally using [these steps](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/tree/main/task1#test-the-model-locally) and make sure the proper output is produced? ``` Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/usr/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar return list(map(*args)) File "/imputation_model.py", line 77, in imputation_model model.fit() File "/imputation_model.py", line 45, in fit self.gan.fit(self.data.X_log[self.data.top_genes_mask, :], self.num_iter, self.step) IndexError: boolean index did not match indexed array along dimension 0; dimension is 22971 but corresponding boolean dimension is 21194 ``` Thank you!
Hi @rchai I have amended my implementation to take an input of shape genes x cells and generate the imputed version with the same shape. I have tested the implementation and it works fine. However, the submission 9730547 has failed immediately after submitting it, which is confusing. Could you please check? Many thanks
Hi @ialsag01, Thank you for your interest in the scRNA-seq and scATAC-seq Data Analysis DREAM Challenge! > is the input shape has to be genes x cells? Or it can be the other way round (cells x genes)? The input files is the count matrix with genes in rows and cells in columns. If needed, you are welcome to transpose the data matrix in your model for your training purpose, but please note you would need to transpose back to genes (rows) x cells (columns) for your output files. The same format as input matrix is expected for output files, otherwise the submission will fail to pass the validation. Thank you!
Hi @rchai , Just to confirm, is the input shape has to be genes x cells? Or it can be the other way round (cells x genes)? My submission (9730538) takes cells x genes. Thanks
Hi @zoradeng, Thank you for your interest in the scRNA-seq and scATAC-seq Data Analysis DREAM Challenge! > For task1, if I understand correctly, we'll always output a m_gene x n_cell matrix from an input m_gene x n_cell matrix, right? For Task 1, _m gene and n cell matrix_ output is expected . If there are any missing genes (_< m_) or cells (_< n_), the corresponding penalty score will be applied.
Hi @rchai, For task1, if I understand correctly, we'll always output a m_gene x n_cell matrix from an input m_gene x n_cell matrix, right? Thank you!
@rchai Thanks for your response I am looking forward to getting the sample data of Task 2, provided by the challenge organizer, to make sure the output format of our model for scoring correctly. Sincerely yours, Tsai-Min
Hi @chentsaimin, Thank you for your interest! For Task 1, you could use [Seurat 3k PMBC count matrix](https://www.synapse.org/#!Synapse:syn48025824) or download the data from the [Multimodal Single-Cell Data Integration - the NeuralPS challenge](https://openproblems.bio/neurips_2021/) as a sample file. There are some basic [example models](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models) as mentioned in the "Submission Tutorial (Docker) > Example" wiki page. By following the steps in the [README of Task1](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/tree/main/task1), you should be able to produce "seurat_pbmc3k_counts_imputed.csv" file in the 'output' folder. For Task 2's sample files, we have reached out to a challenge organizer to see if it's possible to provide any resources of toy data and will respond accordingly. Thank you!
@rchai Could you please provide at least "1 SAMPLE file" corresponding to the input and output paired result for the two tasks so we can design and debug our model correctly? Sincerely yours, Tsai-Min
Hi @esung, Thank you for your interest in the scRNA-seq and scATAC-seq Data Analysis DREAM Challenge! The ground truth is the dataset with the true values or peak calls, which will be used to score the accuracy of your models. In terms of how the ground truth data forms, please refer to the "**Data preparation for challenge testing and validation**" sections of the [Task1: scRNA-seq](https://www.synapse.org/#!Synapse:syn26720920/wiki/620137) and [Task 2: scATAC-seq](https://www.synapse.org/#!Synapse:syn26720920/wiki/620138) sub-wiki pages. I hope it helps. Thank you!

What is ground truth data? page is loading…