(solved)Official Compute Resources for Task1 could not afford the extreme big datasets: ds2 and ds3

Dear challenge manager @rchai, I tried submitting my approach (9731945) for task 1, but it could not be scored successfully due to resource limitations: ``` Submission memory limit of 160G reached. ``` I believed that it was caused by the memory demand in ds2 and ds3, which are totally different from the validation period in ds1c (almost 6 times larger): ``` Dataset HTO Multiplexed? # of multiplexed conditions # of cells Mean reads per cell Mean genes per cell Gene Expression Sequencing Saturation DS1.A No - 1191 382809 7365 0.72 DS1.B No - 950 482820 6706 0.796 DS1.C No - 904 476509 6639 0.795 DS1.D No - 676 665533 6246 0.860 DS2 Yes 8 5892 124540 6762 0.314 DS3 Yes 9 8484 99131 5558 0.506 ``` Considering the extremely short time to tune the model for the final round (it takes 24 hours for 1 model run to know if suceed), I am afraid I may not be able to adapt the model successfully before the due day (only 2 days left). Is there any substitution for the submission if I could not score it before the due day (my write-ups are ready)? Sincerely yours, Tsai-Min

Created by TSAI-MIN (??) CHEN (?) chentsaimin
@rchai Thanks for your help, I respect the office's decision without postponing the deadline. I am now trying the python batch code you provided below : ``` def set_batches(orig_list, batch_size, shuffle=False): """Split large list into smaller batches.""" if shuffle: random.shuffle(orig_list) for i in range(0, len(orig_list), batch_size): yield orig_list[i:i + batch_size] ``` The only thing I am afraid of is that some branches of my paralleled processes in submission 9731977 may fail due to the burst of memory; however, I could not know until 48hrs later... Sincerely yours, Tsai-Min
Hi @chentsaimin, We don't have the plan to extend the deadline for now, but we have increased the following compute resources: - runtime limit: 48h - memory usage: 240g - cpu: 30 I hope it can help to run your models.
Hi @chentsaimin, I am sorry to hear that your models didn't went through with the limited memory. I wonder if you have tried to divide all files into smaller batches (something like [here](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/blob/e26d31012eb9f8c75e3d20a2f0110c430b68d5ca/task1/r-magic/run_model.R#L28-L35) in R or [here](https://github.com/Sage-Bionetworks-Challenges/multi-seq-challenge-example-models/blob/cb7a2fac4ba4f8a30266fdf64156b3c656dbf91a/task1/py-deepimpute/run_model.py#L19-L24) in python) and concurrently impute each batch. It might be helpful to reduce the memory usage. To clarify, there is about 2 times more data in the Final Round compare to the Leaderboard Round and I understand it will require more time to run for your models. The deadline we allow to accept submissions for the Final Round is **Wednesday, February 8th, 23:59 Pacific Time** and a valid submission must be made before the deadline. I will contact organizers to see whether it's possible to extend the deadline and get back to you. Thank you!

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

(solved)Official Compute Resources for Task1 could not afford the extreme big datasets: ds2 and ds3 page is loading…