Official Compute Resources for Task2 could not afford even a baseline method: MACS2

Dear challenge manager @rchai, I have tried many times with my approach for task 2, but none of them could be scored successfully. Therefore, I guess the official computing resources could not afford even the baseline model. So, I tried to submit the most simple model submission (9731938) utilizing the officially provided macs2 code only and confirmed that the official computing resource could not afford it by showing the following message: ``` 718 expected file(s) not found : 'ds1.downsampled.pg_17.seed_5.pct_10.bed', 'ds1.downsampled.pg_17.seed_6.pct_... ``` I believe that it is the issue results from too less memory (20 GiB memory): ``` Task 2 (scATAC-seq data) : 10 vCPU and 20 GiB memory ``` This is a severe issue that even the baseline model could not be computed. Could you please help me fix the issue? If you would like to have my best results, could you please upgrade the task 2 resources to at least the same as task 1? ``` Task 1 (scRNA-seq data) : 20 vCPU and 160 GiB memory ``` Sincerely yours, Tsai-Min

Created by TSAI-MIN (??) CHEN (?) chentsaimin
@rchai, Btw, could you please help me stop the 9732038 for task 2 ? thanks. Sincerely yours, Tsai-Min
@rchai Weirdly, the following message keeps coming up even though I tried to reduce the usage number of CPUs from 10 to 5 to enlarge twice the RAM memory for each process for task2 (9732035): ``` Submission memory limit of 40G reached. ``` I think your framework code restricts the saving size to 40G, although I saved them in the "/tmp" folder. Could you please help me to check what happened? Sincerely yours, Tsai-Min PS: From the beginning of the challenge, I suffered from a disk space of 20G instead of the 120G you mentioned.
@rchai Thanks for your response, I finally realized there is a temporary disk space in the extra "/tmp" folder instead of in the local directory. Now I have the courage to finish it within 48h. Due to troubles caused by resource limitations, I believe you and I are the biggest contributors during this challenge. It's a tough debugging journey for us. Good luck!! Sincerely yours, Tsai-Min
Hi @chentsaimin, The 400g is the disk space limit that can allow you to save more files in the tmp folder. The error is expected since there's a 40g memory limit for Task 2 (was 20g). Although we would like to facilitate improvements to participants? models, the parallel preprocessing of all bam files will require much more RAM than what we could provide for this task. However, based on the final results, it could be brought up for the potential benchmarking efforts during post-hoc analysis, if you are not able to make it by the deadline. If you think there's an alternative way that requires less memory and can be finished within 48h, I encourage to give a try before the deadline. For the submissions made before the deadline, we will still let them run until it has been evaluated. Thank you!
@rchai Thanks for your help. However, the disk space doesn't allow 400g space due to the following message I encountered in 9732027 for task 2: ``` Submission memory limit of 40G reached. ``` In the code, I mkir a "/temp" folder to save my processed 20 "*.bam" files and kill them iteratively. Could you please help me to check what happened? Sincerely yours, Tsai-Min PS: From the beginning of the challenge, I suffered from the disk space of 20G instead of the 120G you mentioned.
> Could you please extend the temporary disk space to at least the same size as the input folder ~781G? @chentsaimin The disk space limit is 120g. Unfortunately, we are not able to extend the disk space to 700g+ due to the resource limit, but we have increased below resources: - cpu: 20 - mem: 40g - disk space: 400g I hope it could help for the preprocess steps.
@chentsaimin The submission '9732001' has been killed.
@rchai, Btw, could you please help me stop the 9732001 for task 1 ? thanks. Sincerely yours, Tsai-Min
@rchai , For task 2, I would like to preprocess the input "bam" files and save them in the local disk memory temporarily for the following peak detection algorithm. However, I could not parallel these processes due to the limited disk space of 20G. Could you please extend the temporary disk space to at least the same size as the input folder ~781G? Otherwise, it could take up to 4 days to compute if I process them individually. Sincerely yours, Tsai-Min
@rchai, Thanks for your great reminder. You are right. The error is due to the increment of memory while iterating a large amount of input data. After I released the accumulated memory, my submission (9731947) is valid now. Sincerely yours, Tsai-Min
Hi @chentsaimin, Thank you for bringing this to our attention. For the submission '9731938', I think the error was caused by your `run_model.py` step blow: ``` python run_model.py --input_dir $OUPUT_DIR --output_dir $OUPUT_DIR ``` The `$INPUT_DIR` should be used for the `--input_dir` parameter. If you would like to test the [macs2 example model](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/tree/main/task2/macs2), you could simply try to submit this [docker image (sc2)](https://www.synapse.org/#!Synapse:syn47753784) uploaded to the live site. Also, please be aware of the potential memory increase when iterating more input data (the total number of input data in the Final Round is 990 compared to 480 in the Leaderboard Round). To reduce the memory usage, it might be helpful to downscale the number of threads or the number of files iterated each loop. Hope it helps. Thank you!

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Official Compute Resources for Task2 could not afford even a baseline method: MACS2 page is loading…