Scoring queue down for maintainence

Dear @RA2DREAMChallengeParticipants, The main queue is currently offline for maintenance to resolve a bug identified by @stadlerm (thank you again!). The fast lane queue for checking whether your container is valid is not affected and is therefore still open. We'll update the participant list once we're back online with the main queue. Thanks for your patience! Best, Robert

Created by Robert Allaway allawayr
okay thanks.
Thanks for the update. UAB Research Computing is looking into it again - other users (not Challenge participants) are reporting this same issue (specifically, this error: `slurm_load_jobs error: Socket timed out on send/recv operation`). I'll update this thread when we hear more. Sorry for any inconvenience!
@allawayr i think the error still persists? or is from end? slurm_load_jobs error: Socket timed out on send/recv operation Traceback (most recent call last): File "rundocker.py", line 166, in main(args) File "rundocker.py", line 106, in main running = check_existing_job(submissionid) File "rundocker.py", line 20, in check_existing_job jobid = subprocess.check_output(commands) File "/home/thomas.yu@sagebionetworks.org/.conda/envs/cwl/lib/python3.7/subprocess.py", line 395, in check_output **kwargs).stdout File "/home/thomas.yu@sagebionetworks.org/.conda/envs/cwl/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['squeue', '--name', '9701611']' returned non-zero exit status 1. [1;30mINFO[0m [job run_docker] Max memory used: 31MiB [1;30mERROR[0m [31m[job run_docker] Job error: ("Error collecting output for parameter 'predictions':\n../../thomas.yu%40sagebionetworks.org/tmpf8onqzfd/fastlane_sbatch.cwl:224:7: Did not find output file with glob pattern: '['predictions.csv']'", {})[0m [1;30mWARNING[0m [33m[job run_docker] completed permanentFail[0m [1;30mERROR[0m [31m[step run_docker] Output is missing expected field file:///data/user/thomas.yu%40sagebionetworks.org/tmpf8onqzfd/wes_workflow.cwl#run_docker/predictions[0m [1;30mWARNING[0m [33m[step run_docker] completed permanentFail[0m [1;30mINFO[0m [workflow ] completed permanentFail [1;30mWARNING[0m [33mFinal process status is permanentFail[0m
@dcentmakeover It sounds like the UAB cluster was having a slurm problem. We pinged the UAB research computing folks with the error messages and got this response: "We restarted the Slurm master process about 10 minutes ago to correct this issue." You should be good to resubmit - please let us know if it fails with this error message again and we can follow up with UAB.
thanks!
sure ill submit again after 20 minutes, ill let you know if it still persists.
@dcentmakeover Both queues should be up and running. It looks like you've submitted twice recently and both times received a slurm error that broke the workflow. UAB's computing core uses slurm to manage all of the jobs (and jobs from this Challenge get added to that queue, so we wait in line with all of the other jobs). My immediate *guess* would be that since it's the middle of the day for UAB that the GPU nodes we are running on are near or at capacity, causing the job to time out...can you try again in a bit and see if the problem is resolved? We'll look into it on our end as well, of course, to see if it's perhaps something else.
@allawayr hey is there an issue with the fast lane ? slurm_load_jobs error: Socket timed out on send/recv operation Traceback (most recent call last): File "rundocker.py", line 166, in main(args) File "rundocker.py", line 106, in main running = check_existing_job(submissionid) File "rundocker.py", line 20, in check_existing_job jobid = subprocess.check_output(commands) File "/home/thomas.yu@sagebionetworks.org/.conda/envs/cwl/lib/python3.7/subprocess.py", line 395, in check_output **kwargs).stdout File "/home/thomas.yu@sagebionetworks.org/.conda/envs/cwl/lib/python3.7/subprocess.py", line 487, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['squeue', '--name', '9701601']' returned non-zero exit status 1. [1;30mINFO[0m [job run_docker] Max memory used: 31MiB [1;30mERROR[0m [31m[job run_docker] Job error: ("Error collecting output for parameter 'predictions':\n../../thomas.yu%40sagebionetworks.org/tmp7ue3j8ls/fastlane_sbatch.cwl:224:7: Did not find output file with glob pattern: '['predictions.csv']'", {})[0m [1;30mWARNING[0m [33m[job run_docker] completed permanentFail[0m [1;30mERROR[0m [31m[step run_docker] Output is missing expected field file:///data/user/thomas.yu%40sagebionetworks.org/tmp7ue3j8ls/wes_workflow.cwl#run_docker/predictions[0m [1;30mWARNING[0m [33m[step run_docker] completed permanentFail[0m [1;30mINFO[0m [workflow ] completed permanentFail [1;30mWARNING[0m [33mFinal process status is permanentFail[0m i get this when i submit? also i dont get an error txt file only a log file.
Dear @RA2DREAMChallengeParticipants, I wanted to send out a quick note to let you know that the scoring emails are still reporting the SC2/SC3 scores in reverse. For a couple of reasons - including that we are avoiding modifying anything about the actual scoring code mid-Challenge - it will take a bit of time to fix this issue. We will get to it as soon as possible but I wouldn't anticipate this being resolved today. For now, please use the [Leaderboard](https://www.synapse.org/#!Synapse:syn20545111/wiki/597246) as the correct source of SC2 (narrowing) and SC3 (erosion) scores for your submitted models. I will post an update to this thread once we've fixed the scoring emails. Cheers, Robert
Thanks, Robert, for your hard work.
Dear @RA2DREAMChallengeParticipants, I have a couple of updates: First, we've fixed a bug identified by @stadlerm that in certain situations returned a score via email while not counting a submission against a quota, and not posting the score to the leaderboard. We've reviewed all of the submissions and found that this only happened for this one participant, and that no quotas were exceeded; we also re-ran this submission so that the score is now posted to the leaderboard. We've also re-opened the main queue for submissions. Second, @sds observed that SC2 (narrowing) /3 (erosion) scores seemed to be swapped on the leaderboard. After looking into this more, this was indeed the case. The leaderboard has been adjusted so that the scores are showing up in the correct columns, so SC2 is the joint space narrowing score, and SC3 is the joint erosion score. Apologies for this mistake - this was my error! Thanks to both for bringing these issues to my attention and let me know if you have any questions, Robert

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Scoring queue down for maintainence page is loading…