This thread is the designated place to ask questions about running the **encode_mapping_workflow** workflow.
Created by James Eddy jaeddy Hello,
I'm still having the problem running this on JES. @jseth - were you able to find any solution for this?
Thanks!
Jim
EDIT: Looks like the problem is because some of the docker images used in the WDL specify an entrypoint, which isn't well supported by JES. (see, e.g., https://github.com/broadinstitute/cromwell/issues/2461#issuecomment-316418026) I created new docker images without the entrypoint and changed the WDL to reference those, and everything works. @jseth We deployed a patch that overcomes the USER issue on the cloud platform, so the workflow can be executed on [CancerGenomicsCloud](https://cgc.sbgenomics.com/), I submitted [instructions](https://www.synapse.org/#!Synapse:syn11272418) for running. Hi Vipin,
I reproduced the same issue on a local Ubuntu machine (on Mac this does not happen). I think this issue is a result of the fact that Docker image has a custom USER added [in the Dockerfile](https://github.com/ENCODE-DCC/pipeline-container/blob/master/images/mapping/Dockerfile#L56). The docker container is started as this custom user, that has no permission to write to that directory. I think cwltool overrides this USER, and starts the container as USER "501", so this error does not appear in cwltool runs. Seth mentioned that the custom USER might be removed from Docker image in the future.
Best,
Bogdan Hi Bogdan,
Thank you for the suggestion to use `rabix` with `encode_mapping_workflow`. I have tried the above command it escapes the error message which I reported earlier but now the workflow ends with another error message:
```shell
rabix --tmpdir-prefix ~/tmp/ --tmp-outdir-prefix ~/tmp/ ./encode_mapping_workflow.cwl encode_mapping_workflow.cwl.json
```
```shell
[STDOUT]
[2017-09-29 17:57:26.367] [INFO] Job root.mapper has started
[2017-09-29 17:58:03.657] [INFO] Pulling docker image quay.io/encode-dcc/mapping:v1.0
[2017-09-29 17:58:04.371] [INFO] Running command line: python /image_software/pipeline-container/src/encode_map.py /home/nexcbu/dream_challenge/2017/workflow_execution/workflows/encode_mapping_workflow/input_data/reference/ENCFF643CGH_ENCODE_GRCh38_bwa.tar.gz native /home/nexcbu/dream_challenge/2017/workflow_execution/workflows/encode_mapping_workflow/input_data/ENCFF000VOL.fastq.gz
[2017-09-29 17:58:06.104] [ERROR] Job 886be86f-84ea-4ed3-bf35-42ab08876d62 failed with exit code 1.
[2017-09-29 17:58:06.105] [INFO] Job 886be86f-84ea-4ed3-bf35-42ab08876d62 failed with exit code 1.
[2017-09-29 17:58:06.182] [WARN] Job root.mapper, rootId: 511170eb-ae2e-45ea-aee2-a9b9a36e79a0 failed: Job 886be86f-84ea-4ed3-bf35-42ab08876d62 failed with exit code 1.
[2017-09-29 17:58:06.190] [WARN] Root job 511170eb-ae2e-45ea-aee2-a9b9a36e79a0 failed.
[2017-09-29 17:58:07.181] [INFO] Failed to execute a Job
```
I don't see any issue here, but I looked at the `job.err.log` file which created as a part of the `rabix` run:
```shell
cat encode_mapping_workflow-2017-09-29-175723.120/root/mapper/job.err.log
```
```shell
[OUTPUT]
Traceback (most recent call last):
File "/image_software/pipeline-container/src/encode_map.py", line 264, in
main(sys.argv[2], sys.argv[1], "-q 5 -l 32 -k 2", False, sys.argv[3], None)
File "/image_software/pipeline-container/src/encode_map.py", line 192, in main
handler = logging.FileHandler('mapping.log')
File "/usr/lib/python2.7/logging/__init__.py", line 903, in __init__
StreamHandler.__init__(self, self._open())
File "/usr/lib/python2.7/logging/__init__.py", line 928, in _open
stream = open(self.baseFilename, self.mode)
IOError: [Errno 13] Permission denied: '/home/nexcbu/dream_challenge/2017/workflow_execution/workflows/encode_mapping_workflow/encode_mapping_workflow-2017-09-29-175723.120/root/mapper/mapping.log'
```
It sounds to me little strange, the user has full permission in this path. I am not sure from where the error raised.
Regards,
Vipin Hi Vipin,
For the rabix run, please try changing `encode_mapping_workflow.cwl` with `./encode_mapping_workflow.cwl`, i.e. running.
```shell
~/tools/rabix-cli-1.0.0/rabix --tmpdir-prefix ~/tmp/ --tmp-outdir-prefix ~/tmp/ ./encode_mapping_workflow.cwl encode_mapping_workflow.cwl.json
```
This is a [known bug in rabix](https://github.com/rabix/bunny/issues/330), and hopefully will be fixed soon.
I also updated the report to include the correct command line for running encode workflow with rabix.
Hope this will help.
Best,
Bogdan Hi Vipin - I have not tried that, yet. But I will follow up with Bogdan and see if he can give us some more details on how he got it to run. I'll be in touch.
Seth
Hi Jim - Thanks for trying our workflow and I'm sorry you're having trouble. I can recreate the issue you're seeing running on JES. I inserted some debugging code and found that the argv that the script actually sees at runtime does not match what's in the exec.sh. I agree that from the exec.sh it _looks_ like it should be getting called correctly, but when the script actually runs it only gets one argument, which I think is the value of TMPDIR.
I don't know what the solution is yet but I am working on it.
Seth
Hi Seth,
Have you tried running your **encode_mapping_workflow** workflow with `rabix` runner?
I tried with `rabix` and get the following error message:
```shell
[nexcbu@wfexec encode_mapping_workflow]$ ~/tools/rabix-cli-1.0.0/rabix --tmpdir-prefix ~/tmp/ --tmp-outdir-prefix ~/tmp/ encode_mapping_workflow.cwl encode_mapping_workflow.cwl.json
[2017-09-28 09:50:59.125] [ERROR] Error: file://encode_mapping_workflow.cwl is not a valid app! Unknown error when parsing the app!
```
I am using `rabix-cli-1.0.0` version runner on RHEL 7.
I saw one submission here with `rabix` https://www.synapse.org/#!Synapse:syn10290494. Unfortunately the documentation is not providing how they performed the run.
If you have any idea, please share with me.
Thank you,
Vipin
The WDL version of the encode_mapping_workflow doesn't seem to work when run via Cromwell on JES. When run locally, everything works:
```
java -jar ~/tmp/cromwell/target/scala-2.11/cromwell-28-9a403ba-SNAP.jar run encode_mapping_workflow.wdl encode_mapping_workflow.wdl_test.json - outputmetadata -
```
When running with JES (Google), it fails (bucket and non-important prefixes removed from path names):
```
Task encode_mapping_workflow.mapping:NA:1 failed. JES error code 10. Message: 11: Docker run failed: command failed: Traceback (most recent call last): File "/image_software/pipeline-container/src/encode_map.py", line 266, in main(sys.argv[2], sys.argv[1], "-q 5 -l 32 -k 2", False, sys.argv[3], sys.argv[4]) IndexError: list index out of range . See logs at gs:////encode_mapping_workflow/8d582ec2-e3d4-4e4d-b923-8a2e019fa31e/call-mapping/
```
I think the Cromwell-generated exec.sh looks fine?:
```sh
#!/bin/bash
tmpDir=$(mktemp -d /cromwell_root/tmp.XXXXXX)
chmod 777 $tmpDir
export _JAVA_OPTIONS=-Djava.io.tmpdir=$tmpDir
export TMPDIR=$tmpDir
(
cd /cromwell_root
python /image_software/pipeline-container/src/encode_map.py /cromwell_root/167830-46vy7eojpu9h3sogufnigvh2/168006/GRCh38_chr21_bwa.tar.gz native /cromwell_root/167830-46vy7eojpu9h3sogufnigvh2/168006/ENCFF000VOL_chr21.fq.gz
)
echo $? > /cromwell_root/mapping-rc.txt.tmp
(
cd /cromwell_root
mkdir /cromwell_root/glob-5173ae221240acd2e2a97be856fec02c
( ln -L *.sai /cromwell_root/glob-5173ae221240acd2e2a97be856fec02c 2> /dev/null ) || ( ln *.sai /cromwell_root/glob-5173ae221240acd2e2a97be856fec02c )
ls -1 /cromwell_root/glob-5173ae221240acd2e2a97be856fec02c > /cromwell_root/glob-5173ae221240acd2e2a97be856fec02c.list
mkdir /cromwell_root/glob-14818d7eab137a9180e99ef55bce91d6
( ln -L *.gz /cromwell_root/glob-14818d7eab137a9180e99ef55bce91d6 2> /dev/null ) || ( ln *.gz /cromwell_root/glob-14818d7eab137a9180e99ef55bce91d6 )
ls -1 /cromwell_root/glob-14818d7eab137a9180e99ef55bce91d6 > /cromwell_root/glob-14818d7eab137a9180e99ef55bce91d6.list
mkdir /cromwell_root/glob-9c7dddbb736ff5ba3fd769a2cd80ebff
( ln -L mapping.log /cromwell_root/glob-9c7dddbb736ff5ba3fd769a2cd80ebff 2> /dev/null ) || ( ln mapping.log /cromwell_root/glob-9c7dddbb736ff5ba3fd769a2cd80ebff )
ls -1 /cromwell_root/glob-9c7dddbb736ff5ba3fd769a2cd80ebff > /cromwell_root/glob-9c7dddbb736ff5ba3fd769a2cd80ebff.list
mkdir /cromwell_root/glob-31ec53741a6e261bb4ef24cf8feae316
( ln -L mapping.json /cromwell_root/glob-31ec53741a6e261bb4ef24cf8feae316 2> /dev/null ) || ( ln mapping.json /cromwell_root/glob-31ec53741a6e261bb4ef24cf8feae316 )
ls -1 /cromwell_root/glob-31ec53741a6e261bb4ef24cf8feae316 > /cromwell_root/glob-31ec53741a6e261bb4ef24cf8feae316.list
)
sync
mv /cromwell_root/mapping-rc.txt.tmp /cromwell_root/mapping-rc.txt
```
Has anybody had success running this workflow via Cromwell with JES?
Great. Thanks for working through this, Vipin. Your documentation will be especially valuable to help us think about best practices for disk space provisioning.
Talk to you soon,
Seth
Hi Seth,
Yes your workflow finished with toil and I ran the result checker. Everything worked well!
Thank you,
Vipin Thank you Seth! I was not able to respond your comments in last day.
As you suggested, it is better to keep the file as it is in the workflow challenge location, rather I will modify my scripts and try to run. While adding the documentation, I will point to this discussion and add the steps which I took to run the workflow using rabix.
Thanks again for checking this quickly and I have started running the workflow as you mentioned in the previous thread. I am not seeing the predefined space error now (for the new updated outdirMin and tmpdirMin) values.
I will update how it goes.
Regards,
Vipin Vipin - I know what's wrong. In our individual stage cwl files we specify a rather large minimum disk (1TB split between output and working directories). After reducing that to 100GB the workflow ran fine on the test inputs for me on a small machine. I'm testing the full-sized inputs now. Assuming that works (I believe it will), then I can think of two ways to address your problem. One option is I could modify the CWL files deployed to the DREAM contest directory to request less disk space. Another option is you could modify the CWL files you are running with, reducing the disk space request to fit your system. I rather prefer the second option. I would like to avoid changing the workflow in the middle of the contest to remain as consistent as possible with the entries already submitted using the workflow as it is today.
Specifically, you would need to edit your copies of mapping.cwl, post_processing.cwl, filter_qc.cwl, and xcor.cwl. Since your machine appears to have ~70GB free, setting outdirMin and tmpdirMin to 20000 or 30000 (20G or 30G) should work.
If that's what you do, please include that in your documentation so that others can benefit from what you've learned.
You can also wait until my full-sized test run completes. I won't have an opportunity to gather those outputs until Monday.
Thanks, again, for hanging in there with us and helping us to smooth some rough edges in our workflow.
Seth
OK good experiment. Thank you for doing that, Vipin. I can see that the amount of disk space Toil is requesting is the same as before, so does not depend on the size of the inputs. It must be the way we've specified the individual CWL stages. I will dig deeper and see what I can figure out. I'll be back with you soon.
Seth
Hi Seth,
Thank you for the quick response. I have tested with the smaller input file corresponds to `chr21` as:
```shell
cwltoil --workDir ~/tmp/ encode_mapping_workflow.cwl encode_mapping_workflow.cwl_test.json
```
This also returns me the same error message as above:
```
Traceback (most recent call last):
File "/home/nexcbu/tools/venv/bin/cwltoil", line 11, in
sys.exit(main())
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 940, in main
outobj = toil.start(wf1)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/common.py", line 727, in start
return self._runMainLoop(rootJobGraph)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/common.py", line 997, in _runMainLoop
jobCache=self._jobCache).run()
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 183, in run
self.innerLoop()
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 388, in innerLoop
self.issueJobs(successors)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 554, in issueJobs
self.issueJob(job)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 536, in issueJob
jobBatchSystemID = self.batchSystem.issueBatchJob(jobNode)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/batchSystems/singleMachine.py", line 208, in issueBatchJob
self.checkResourceRequest(jobNode.memory, cores, jobNode.disk)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/batchSystems/abstractBatchSystem.py", line 256, in checkResourceRequest
raise InsufficientSystemResources('disk', disk, self.maxDisk)
toil.batchSystems.abstractBatchSystem.InsufficientSystemResources: Requesting more disk than either physically available, or enforced by --maxDisk. Requested: 1073741824000, Available: 69968257024
```
This time I didn't put any value for `--maxDisk` and by default it measured the freely available space on my machine.
The other submission from UCSC using Toil was having a `env_disk: "2Tb"`. So they didn't see this error message. With Toil I am not seeing any option to restrict this greedy disk space use. I have no idea where to check. Please let me know if you would like to do more test from my side.
Thanks,
Vipin Hi Vipin - Thanks for trying our workflow, and I'm sorry you're having trouble. I'm not sure where that 1TB request is coming from. Would you mind trying to run the workflow with the smaller test input files and see if that will work? You can use the input JSON file encode_mapping_workflow.wdl_test.json. That will run the workflow on a chr21 extract against a chr21-only reference (both of which are included as part of the workflow package you downloaded). Whether or not that works will give us some clues to work from.
I note that another submission using Toil was run on a large EC2 instance, and might have gotten around this unreasonable disk space request that way. https://www.synapse.org/#!Synapse:syn10676071
Let me know if you get anything from running with encode_mapping_workflow.wdl_test.json and we'll take it from there.
Seth
ENCODE DCC
Hello,
I am trying to run encode_mapping_workflow with Toil (cwltoil).
Downloaded workflow input files as:
```shell
cwltoil --workDir ~/tmp/ ~/synapse_utils/dockstore-tool-synapse-get.cwl ~/config_files/encode_mapping_workflow_get.cwl.json
```
Run the workflow:
```shell
cwltoil --workDir ~/tmp/ --maxDisk 30000000000 encode_mapping_workflow.cwl encode_mapping_workflow.cwl.json
```
Running the workflow ended up in the error message:
```
2017-09-22 13:02:40,744 - toil.statsAndLogging - INFO - ... finished collating stats and logs. Took 0.00424098968506 seconds
Traceback (most recent call last):
File "/home/nexcbu/tools/venv/bin/cwltoil", line 11, in
sys.exit(main())
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/cwl/cwltoil.py", line 940, in main
outobj = toil.start(wf1)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/common.py", line 727, in start
return self._runMainLoop(rootJobGraph)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/common.py", line 997, in _runMainLoop
jobCache=self._jobCache).run()
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 183, in run
self.innerLoop()
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 388, in innerLoop
self.issueJobs(successors)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 554, in issueJobs
self.issueJob(job)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/leader.py", line 536, in issueJob
jobBatchSystemID = self.batchSystem.issueBatchJob(jobNode)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/batchSystems/singleMachine.py", line 208, in issueBatchJob
self.checkResourceRequest(jobNode.memory, cores, jobNode.disk)
File "/home/nexcbu/tools/venv/lib/python2.7/site-packages/toil/batchSystems/abstractBatchSystem.py", line 256, in checkResourceRequest
raise InsufficientSystemResources('disk', disk, self.maxDisk)
toil.batchSystems.abstractBatchSystem.InsufficientSystemResources: Requesting more disk than either physically available, or enforced by --maxDisk. Requested: 1073741824000, Available: 30000000000
```
In the documentation it says that the maximum disk space for running this workflow as 30Gb. I am not sure from where this 1TB disk space request error came in while running the workflow. I am running on RHEL 7 with 60GB disk space.
Drop files to upload
encode_mapping_workflow workflow help page is loading…