My submission (submission ID 9694831) on fast lane got failed. In the log, status shows as VALIDATED. I was able to run the model in local machine within 1 hour successfully and create output files. Any leads to the reason will be appreciated **Received mail :** Your workflow job, (submission ID 9694831), has failed to complete. The message is: -packages/toil/leader.py", line 246, in run STDERR: 2019-11-05T14:08:05.567718178Z raise FailedJobsException(self.config.jobStore, self.toilState.totalFailedJobs, self.jobStore) STDERR: 2019-11-05T14:08:05.567722497Z toil.leader.FailedJobsException **Log:** STDERR: 2019-11-05T13:50:21.218004008Z INFO:cwltool:Resolved '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/synthetic_docker_agent_workflow.cwl' to 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/synthetic_docker_agent_workflow.cwl' STDERR: 2019-11-05T13:50:23.592904556Z EHR-challenge-develop/synthetic_docker_agent_workflow.cwl:143:9: object id `EHR-challenge-develop/synthetic_docker_agent_workflow.cwl#run_docker_infer/status` previously defined STDERR: 2019-11-05T13:50:23.592931457Z WARNING:salad:EHR-challenge-develop/synthetic_docker_agent_workflow.cwl:143:9: object id `EHR-challenge-develop/synthetic_docker_agent_workflow.cwl#run_docker_infer/status` previously defined STDERR: 2019-11-05T13:50:23.594660034Z EHR-challenge-develop/synthetic_docker_agent_workflow.cwl:117:9: object id `EHR-challenge-develop/synthetic_docker_agent_workflow.cwl#run_docker_train/status` previously defined STDERR: 2019-11-05T13:50:23.594676334Z WARNING:salad:EHR-challenge-develop/synthetic_docker_agent_workflow.cwl:117:9: object id `EHR-challenge-develop/synthetic_docker_agent_workflow.cwl#run_docker_train/status` previously defined STDERR: 2019-11-05T13:50:27.829890983Z EHR-challenge-develop/run_synthetic_infer_docker.cwl:22:5: object id `EHR-challenge-develop/run_synthetic_infer_docker.cwl#status` previously defined STDERR: 2019-11-05T13:50:27.829924854Z WARNING:salad:EHR-challenge-develop/run_synthetic_infer_docker.cwl:22:5: object id `EHR-challenge-develop/run_synthetic_infer_docker.cwl#status` previously defined STDERR: 2019-11-05T13:50:28.070871120Z EHR-challenge-develop/run_synthetic_training_docker.cwl:22:5: object id `EHR-challenge-develop/run_synthetic_training_docker.cwl#status` previously defined STDERR: 2019-11-05T13:50:28.070884780Z WARNING:salad:EHR-challenge-develop/run_synthetic_training_docker.cwl:22:5: object id `EHR-challenge-develop/run_synthetic_training_docker.cwl#status` previously defined STDERR: 2019-11-05T13:50:29.080163538Z WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (32). STDERR: 2019-11-05T13:50:29.080182539Z WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (268108005376). STDERR: 2019-11-05T13:50:29.080305730Z WARNING:toil.batchSystems.singleMachine:Limiting maxDisk to physically available disk (167463268352). STDERR: 2019-11-05T13:50:29.329194180Z INFO:toil:Running Toil version 3.20.0-cf34ca3416697f2abc816b2538f20ee29ba16932. STDERR: 2019-11-05T13:50:29.630487207Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:29.631659424Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpKljJhX/worker_log.txt STDERR: 2019-11-05T13:50:30.461736592Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/notification_email.cwl' python z/U/jobXSLJM9 with job batch system ID: 1 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:30.462355745Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/download_from_synapse.cwl' python s/H/jobZDn9SJ with job batch system ID: 2 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:30.470113421Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_docker_config.cwl' python C/L/jobo_WvD3 with job batch system ID: 3 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:30.470294412Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_submission_docker.cwl' python 3/N/jobUeyvmc with job batch system ID: 4 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:30.789473951Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:30.790271416Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpYW6Vde/worker_log.txt STDERR: 2019-11-05T13:50:30.819255064Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:30.820325721Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmppEeGEl/worker_log.txt STDERR: 2019-11-05T13:50:30.821636089Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:30.823072597Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpmQqrCw/worker_log.txt STDERR: 2019-11-05T13:50:30.841132842Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:30.842179508Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpMWes79/worker_log.txt STDERR: 2019-11-05T13:50:32.637260229Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/download_from_synapse.cwl' python s/H/jobZDn9SJ STDERR: 2019-11-05T13:50:33.353435803Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_docker_config.cwl' python C/L/jobo_WvD3 STDERR: 2019-11-05T13:50:33.668556716Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_submission_docker.cwl' python 3/N/jobUeyvmc STDERR: 2019-11-05T13:50:33.669430871Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/validate_docker.cwl' python k/i/jobhd11Mh with job batch system ID: 5 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:33.742095444Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/notification_email.cwl' python z/U/jobXSLJM9 STDERR: 2019-11-05T13:50:33.993927717Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:33.995110735Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpEPJmVa/worker_log.txt STDERR: 2019-11-05T13:50:36.558577151Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/validate_docker.cwl' python k/i/jobhd11Mh STDERR: 2019-11-05T13:50:36.559390315Z INFO:toil.leader:Issued job 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/annotate_submission.cwl' python M/b/jobQLkYQ6 with job batch system ID: 6 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:36.567018399Z INFO:toil.leader:Issued job 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 with job batch system ID: 7 and cores: 1, disk: 11.0 G, and memory: 100.0 M STDERR: 2019-11-05T13:50:36.847683438Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:36.848560772Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmpPRXKB7/worker_log.txt STDERR: 2019-11-05T13:50:36.861300587Z DEBUG:toil.jobStores.fileJobStore:Path to job store directory is '/var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci'. STDERR: 2019-11-05T13:50:36.862321912Z INFO:toil.worker:Redirecting logging to /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/toil-60bf367f-855b-4d92-b54c-0724b2c452d6-3bdead962f93fd7fa17dcb3c0b3ee830/tmptkuC48/worker_log.txt STDERR: 2019-11-05T13:50:39.784986050Z INFO:toil.leader:Job ended successfully: 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/annotate_submission.cwl' python M/b/jobQLkYQ6 STDERR: 2019-11-05T14:07:55.852530622Z INFO:toil.leader:Job ended successfully: 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.853115583Z WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.853162533Z WARNING:toil.leader:5/A/jobbL5PB7 INFO:toil.worker:---TOIL WORKER OUTPUT LOG--- STDERR: 2019-11-05T14:07:55.853215993Z WARNING:toil.leader:5/A/jobbL5PB7 INFO:toil:Running Toil version 3.20.0-cf34ca3416697f2abc816b2538f20ee29ba16932. STDERR: 2019-11-05T14:07:55.853357293Z WARNING:toil.leader:5/A/jobbL5PB7 [job run_synthetic_training_docker.cwl] /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci/f/6/out_tmpdirYt6cGm$ python \ STDERR: 2019-11-05T14:07:55.853376933Z WARNING:toil.leader:5/A/jobbL5PB7 runDocker.py \ STDERR: 2019-11-05T14:07:55.853451493Z WARNING:toil.leader:5/A/jobbL5PB7 -s \ STDERR: 2019-11-05T14:07:55.853462724Z WARNING:toil.leader:5/A/jobbL5PB7 9694831 \ STDERR: 2019-11-05T14:07:55.853551743Z WARNING:toil.leader:5/A/jobbL5PB7 -p \ STDERR: 2019-11-05T14:07:55.853608133Z WARNING:toil.leader:5/A/jobbL5PB7 docker.synapse.org/syn21121957/tcs_ehr_model \ STDERR: 2019-11-05T14:07:55.853691334Z WARNING:toil.leader:5/A/jobbL5PB7 -d \ STDERR: 2019-11-05T14:07:55.853725844Z WARNING:toil.leader:5/A/jobbL5PB7 sha256:58bb4d6eea07174d0e95e84912c80124cb4fab1d2801ecfe4342039264932b27 \ STDERR: 2019-11-05T14:07:55.853733784Z WARNING:toil.leader:5/A/jobbL5PB7 --status \ STDERR: 2019-11-05T14:07:55.853815514Z WARNING:toil.leader:5/A/jobbL5PB7 VALIDATED \ STDERR: 2019-11-05T14:07:55.853826704Z WARNING:toil.leader:5/A/jobbL5PB7 --parentid \ STDERR: 2019-11-05T14:07:55.853961714Z WARNING:toil.leader:5/A/jobbL5PB7 syn21122324 \ STDERR: 2019-11-05T14:07:55.853973745Z WARNING:toil.leader:5/A/jobbL5PB7 -c \ STDERR: 2019-11-05T14:07:55.854055774Z WARNING:toil.leader:5/A/jobbL5PB7 /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmp8RYKnL/stga4a447d9-7e38-4358-9dec-9fa4f38b198d/.synapseConfig \ STDERR: 2019-11-05T14:07:55.854067325Z WARNING:toil.leader:5/A/jobbL5PB7 -i \ STDERR: 2019-11-05T14:07:55.854164945Z WARNING:toil.leader:5/A/jobbL5PB7 /home/thomasyu/train STDERR: 2019-11-05T14:07:55.854226685Z WARNING:toil.leader:5/A/jobbL5PB7 INFO:cwltool:[job run_synthetic_training_docker.cwl] /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmpmvDwci/f/6/out_tmpdirYt6cGm$ python \ STDERR: 2019-11-05T14:07:55.854300905Z WARNING:toil.leader:5/A/jobbL5PB7 runDocker.py \ STDERR: 2019-11-05T14:07:55.854306755Z WARNING:toil.leader:5/A/jobbL5PB7 -s \ STDERR: 2019-11-05T14:07:55.854399735Z WARNING:toil.leader:5/A/jobbL5PB7 9694831 \ STDERR: 2019-11-05T14:07:55.854412706Z WARNING:toil.leader:5/A/jobbL5PB7 -p \ STDERR: 2019-11-05T14:07:55.854506846Z WARNING:toil.leader:5/A/jobbL5PB7 docker.synapse.org/syn21121957/tcs_ehr_model \ STDERR: 2019-11-05T14:07:55.854513196Z WARNING:toil.leader:5/A/jobbL5PB7 -d \ STDERR: 2019-11-05T14:07:55.854619275Z WARNING:toil.leader:5/A/jobbL5PB7 sha256:58bb4d6eea07174d0e95e84912c80124cb4fab1d2801ecfe4342039264932b27 \ STDERR: 2019-11-05T14:07:55.854629006Z WARNING:toil.leader:5/A/jobbL5PB7 --status \ STDERR: 2019-11-05T14:07:55.854725966Z WARNING:toil.leader:5/A/jobbL5PB7 VALIDATED \ STDERR: 2019-11-05T14:07:55.854736806Z WARNING:toil.leader:5/A/jobbL5PB7 --parentid \ STDERR: 2019-11-05T14:07:55.854838046Z WARNING:toil.leader:5/A/jobbL5PB7 syn21122324 \ STDERR: 2019-11-05T14:07:55.854848186Z WARNING:toil.leader:5/A/jobbL5PB7 -c \ STDERR: 2019-11-05T14:07:55.854958616Z WARNING:toil.leader:5/A/jobbL5PB7 /var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/tmp8RYKnL/stga4a447d9-7e38-4358-9dec-9fa4f38b198d/.synapseConfig \ STDERR: 2019-11-05T14:07:55.854982426Z WARNING:toil.leader:5/A/jobbL5PB7 -i \ STDERR: 2019-11-05T14:07:55.855053587Z WARNING:toil.leader:5/A/jobbL5PB7 /home/thomasyu/train STDERR: 2019-11-05T14:07:55.855116007Z WARNING:toil.leader:5/A/jobbL5PB7 Welcome, ehrdreamservice! STDERR: 2019-11-05T14:07:55.855127247Z WARNING:toil.leader:5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.855194467Z WARNING:toil.leader:5/A/jobbL5PB7 root STDERR: 2019-11-05T14:07:55.855291087Z WARNING:toil.leader:5/A/jobbL5PB7 mounting volumes STDERR: 2019-11-05T14:07:55.855298857Z WARNING:toil.leader:5/A/jobbL5PB7 checking for containers STDERR: 2019-11-05T14:07:55.855375347Z WARNING:toil.leader:5/A/jobbL5PB7 running container STDERR: 2019-11-05T14:07:55.855436757Z WARNING:toil.leader:5/A/jobbL5PB7 creating logfile STDERR: 2019-11-05T14:07:55.855515348Z WARNING:toil.leader:5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.855525158Z WARNING:toil.leader:5/A/jobbL5PB7 ################################################## STDERR: 2019-11-05T14:07:55.855606637Z WARNING:toil.leader:5/A/jobbL5PB7 Uploading file to Synapse storage STDERR: 2019-11-05T14:07:55.855612568Z WARNING:toil.leader:5/A/jobbL5PB7 ################################################## STDERR: 2019-11-05T14:07:55.855730748Z WARNING:toil.leader:5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.855736848Z WARNING:toil.leader:5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.855833478Z WARNING:toil.leader:5/A/jobbL5PB7 ################################################## STDERR: 2019-11-05T14:07:55.855841208Z WARNING:toil.leader:5/A/jobbL5PB7 Uploading file to Synapse storage STDERR: 2019-11-05T14:07:55.855963469Z WARNING:toil.leader:5/A/jobbL5PB7 ################################################## STDERR: 2019-11-05T14:07:55.855985298Z WARNING:toil.leader:5/A/jobbL5PB7 STDERR: 2019-11-05T14:07:55.856094819Z WARNING:toil.leader:5/A/jobbL5PB7 finished training STDERR: 2019-11-05T14:07:55.856105569Z WARNING:toil.leader:5/A/jobbL5PB7 Traceback (most recent call last): STDERR: 2019-11-05T14:07:55.856192579Z WARNING:toil.leader:5/A/jobbL5PB7 File "runDocker.py", line 179, in STDERR: 2019-11-05T14:07:55.856199599Z WARNING:toil.leader:5/A/jobbL5PB7 main(args) STDERR: 2019-11-05T14:07:55.856309839Z WARNING:toil.leader:5/A/jobbL5PB7 File "runDocker.py", line 137, in main STDERR: 2019-11-05T14:07:55.856318700Z WARNING:toil.leader:5/A/jobbL5PB7 raise Exception("No model generated, please check training docker") STDERR: 2019-11-05T14:07:55.856400469Z WARNING:toil.leader:5/A/jobbL5PB7 Exception: No model generated, please check training docker STDERR: 2019-11-05T14:07:55.856439979Z WARNING:toil.leader:5/A/jobbL5PB7 [job run_synthetic_training_docker.cwl] Max memory used: 49MiB STDERR: 2019-11-05T14:07:55.856497700Z WARNING:toil.leader:5/A/jobbL5PB7 INFO:cwltool:[job run_synthetic_training_docker.cwl] Max memory used: 49MiB STDERR: 2019-11-05T14:07:55.856543940Z WARNING:toil.leader:5/A/jobbL5PB7 [job run_synthetic_training_docker.cwl] Job error: STDERR: 2019-11-05T14:07:55.856608330Z WARNING:toil.leader:5/A/jobbL5PB7 Error collecting output for parameter 'model': STDERR: 2019-11-05T14:07:55.856619140Z WARNING:toil.leader:5/A/jobbL5PB7 :1:1: Did not find output file with glob pattern: '['model_files.tar.gz']' STDERR: 2019-11-05T14:07:55.856716110Z WARNING:toil.leader:5/A/jobbL5PB7 ERROR:cwltool:[job run_synthetic_training_docker.cwl] Job error: STDERR: 2019-11-05T14:07:55.856728260Z WARNING:toil.leader:5/A/jobbL5PB7 Error collecting output for parameter 'model': STDERR: 2019-11-05T14:07:55.856866431Z WARNING:toil.leader:5/A/jobbL5PB7 :1:1: Did not find output file with glob pattern: '['model_files.tar.gz']' STDERR: 2019-11-05T14:07:55.856882871Z WARNING:toil.leader:5/A/jobbL5PB7 [job run_synthetic_training_docker.cwl] completed permanentFail STDERR: 2019-11-05T14:07:55.856957760Z WARNING:toil.leader:5/A/jobbL5PB7 WARNING:cwltool:[job run_synthetic_training_docker.cwl] completed permanentFail STDERR: 2019-11-05T14:07:55.856968781Z WARNING:toil.leader:5/A/jobbL5PB7 Traceback (most recent call last): STDERR: 2019-11-05T14:07:55.857080281Z WARNING:toil.leader:5/A/jobbL5PB7 File "/usr/local/lib/python2.7/site-packages/toil/worker.py", line 331, in workerScript STDERR: 2019-11-05T14:07:55.857092511Z WARNING:toil.leader:5/A/jobbL5PB7 job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore) STDERR: 2019-11-05T14:07:55.857191131Z WARNING:toil.leader:5/A/jobbL5PB7 File "/usr/local/lib/python2.7/site-packages/toil/job.py", line 1378, in _runner STDERR: 2019-11-05T14:07:55.857201791Z WARNING:toil.leader:5/A/jobbL5PB7 returnValues = self._run(jobGraph, fileStore) STDERR: 2019-11-05T14:07:55.857316381Z WARNING:toil.leader:5/A/jobbL5PB7 File "/usr/local/lib/python2.7/site-packages/toil/job.py", line 1323, in _run STDERR: 2019-11-05T14:07:55.857327541Z WARNING:toil.leader:5/A/jobbL5PB7 return self.run(fileStore) STDERR: 2019-11-05T14:07:55.857425072Z WARNING:toil.leader:5/A/jobbL5PB7 File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 606, in run STDERR: 2019-11-05T14:07:55.857435311Z WARNING:toil.leader:5/A/jobbL5PB7 raise cwltool.errors.WorkflowException(status) STDERR: 2019-11-05T14:07:55.857542412Z WARNING:toil.leader:5/A/jobbL5PB7 WorkflowException: permanentFail STDERR: 2019-11-05T14:07:55.857553762Z WARNING:toil.leader:5/A/jobbL5PB7 ERROR:toil.worker:Exiting the worker because of a failed job on host 41def308d0ac STDERR: 2019-11-05T14:07:55.857680342Z WARNING:toil.leader:5/A/jobbL5PB7 WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 with ID 5/A/jobbL5PB7 to 0 STDERR: 2019-11-05T14:07:55.859408596Z WARNING:toil.leader:Job 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 with ID 5/A/jobbL5PB7 is completely failed STDERR: 2019-11-05T14:08:05.479165444Z INFO:toil.leader:Finished toil run with 6 failed jobs. STDERR: 2019-11-05T14:08:05.479650654Z INFO:toil.leader:Failed jobs at end of the run: 'file:///var/lib/docker/volumes/workflow_orchestrator_shared/_data/9365e01c-9190-4ef9-a6db-7df72cc507e3/EHR-challenge-develop/run_synthetic_training_docker.cwl' python 5/A/jobbL5PB7 'CWLWorkflow' J/6/jobSVbqEI 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_submission_docker.cwl' python 3/N/jobUeyvmc 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/get_docker_config.cwl' python C/L/jobo_WvD3 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/validate_docker.cwl' python k/i/jobhd11Mh 'https://raw.githubusercontent.com/Sage-Bionetworks/ChallengeWorkflowTemplates/v1.6/download_from_synapse.cwl' python s/H/jobZDn9SJ STDERR: 2019-11-05T14:08:05.566859435Z Traceback (most recent call last): STDERR: 2019-11-05T14:08:05.566897055Z File "/usr/local/bin/toil-cwl-runner", line 8, in STDERR: 2019-11-05T14:08:05.566902396Z sys.exit(main()) STDERR: 2019-11-05T14:08:05.566906285Z File "/usr/local/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 1276, in main STDERR: 2019-11-05T14:08:05.567101406Z outobj = toil.start(wf1) STDERR: 2019-11-05T14:08:05.567129626Z File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 781, in start STDERR: 2019-11-05T14:08:05.567368516Z return self._runMainLoop(rootJobGraph) STDERR: 2019-11-05T14:08:05.567383156Z File "/usr/local/lib/python2.7/site-packages/toil/common.py", line 1054, in _runMainLoop STDERR: 2019-11-05T14:08:05.567704727Z jobCache=self._jobCache).run() STDERR: 2019-11-05T14:08:05.567713767Z File "/usr/local/lib/python2.7/site-packages/toil/leader.py", line 246, in run STDERR: 2019-11-05T14:08:05.567718178Z raise FailedJobsException(self.config.jobStore, self.toilState.totalFailedJobs, self.jobStore) STDERR: 2019-11-05T14:08:05.567722497Z toil.leader.FailedJobsException ```

Created by Ibrahim Roshan K roshkjr
Hi @roshkjr, It looks like your model output the training log file with the error "/app/train.sh: line 4: 6 Killed python /app/train.py". Typically this means that your container went over the allotted 30 GB of RAM when training. Hope this helps! Tim
Hi @trberg , Yes, When I ran in the local machine it was outputting the trained model to "model" folder and infering stage was also run successfully
Hi @roshkjr, According to this error log > STDERR: 2019-11-05T14:07:55.856318700Z WARNING:toil.leader:5/A/jobbL5PB7 raise Exception("No model generated, please check training docker") >STDERR: 2019-11-05T14:07:55.856400469Z WARNING:toil.leader:5/A/jobbL5PB7 Exception: No model generated, please check training docker Is your docker container outputting a model to the "model" folder? Thanks, Tim

Workflow Failure on fast lane page is loading…