Hi,
Having submitted our first docker image for evaluation we have received the log files for each dataset (all having failed) and they all had the same error initial errors:
```
* Starting Elasticsearch Server
/etc/init.d/elasticsearch: line 104: ulimit: open files: cannot modify limit: Operation not permitted
sysctl: setting key "vm.max_map_count": Read-only file system
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
2020-03-10 10:49:53,563 main ERROR Could not determine local host name java.net.UnknownHostException: b25a9b183f7d: b25a9b183f7d: Temporary failure in name resolution
```
Now the problem seems to be the Elasticsearch init file is executing commands to change the ulimit of MAX_OPEN_FILES to 65535 (`ulimit -n 65535`)and to change MAX_MAP_COUNT=262144 (the number required for Elasticsearch to run) by executing `sysctl -q -w vm.max_map_count=262144`. In a Docker container it does not have permission to perform those changes as you will get "Read-only file system" as a response. However, I do not run into these errors when running the image locally - due to Elasticsearch init only attempting to change those values if they are set below the values just mentioned.
Essentially I would like to know what commands Synapse is using to run the Docker image and what arguments are being used to limit MAX_MAP_COUNT and MAX_OPEN_FILES if any as debugging is currently rather difficult since I cannot reproduce those errors locally.
Created by JoshReed Hi @JoshReed,
The only difference is we do not use sudo when running docker.
There is a way to grant your user docker daemon privileges so that you don't need sudo. Were you still having issues submitting?
Best,
Tom Hi Tom,
I have produced a replica EC2 instance and added my submitted image to it. I ran the image with:
`
sudo docker run --rm -v /home/ec2-user/input:/input/ -v /home/ec2-user/data:/data/ -v /home/ec2-user/output:/output/ --network none
`
(`-m 6G` wasn't added since EC2 has only 4G of memory) I was able to replicate the errors from the submission server but the process persisted and program ran successfully.
If possible could you provide the `docker run` command that is used on the submission server?
Thanks,
Josh Hi Tom,
Thanks for that info. I just been able to replicated the Java error by running the image with `--network none` tag so can now work on debugging locally.
Thanks for your help,
Josh Hello @JoshReed
No problem, sorry I can't be of more help, but submissions are run on t2.2xlarge (8 CPU, 32 GB MEM) amazon linux 2 ec2 . Each submission gets 6GB of memory and has no network access.
We won't be running any containers as privileged, so i bet any system call inside the container will most likely fail.
Best,
Tom Hi Tom,
Thanks for getting those numbers. The numbers I gave to you were incorrect as I had not restarted docker after I had been playing with changing those values. After restarting I get:
```
cat /proc/sys/fs/file-max
524288
```
and
```
cat /proc/sys/vm/max_map_count
262144
```
Now if I change those values to the ones on your system by running the container with the `--privileged` tag (which bypasses the Read-only filesystem permissions) I can then close that container and have those altered values the next time I run an image. The command I was using to do this was `docker run --rm --privileged -it --entrypoint=/bin/bash docker.synapse.org/syn21752035/first_submission:version1` to bypass the image entrypoint and enter into the shell.
If I now run the image out of privileged mode with `docker run --rm -it --entrypoint=/bin/bash docker.synapse.org/syn21752035/first_submission:version1` and check the values of max_map_count and max_open_files they are the same as on your system:
```
cat /proc/sys/fs/file-max
3282740
cat /proc/sys/vm/max_map_count
65530
```
So running with `docker run --rm -v ~/Documents/input/:/input/ -v ~/Documents/data/:/data/ -v ~/Documents/output:/output/ docker.synapse.org/syn21752035/first_submission:version1 ` the program executes correctly even with the output:
```
* Starting Elasticsearch Server
sysctl: setting key "vm.max_map_count": Read-only file system
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
```
With this knowledge I then pushed a Version1.1 to Synapse and submitted. This version has the code which causes Elasticsearch to try to change values commented out since it evidently can run on the values your system has. Again all datasets failed with the key error being the java.net.UnknownHostException,
```
* Starting Elasticsearch Server
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
2020-03-11 09:23:10,314 main ERROR Could not determine local host name java.net.UnknownHostException: 38cf8893a1c0: 38cf8893a1c0: Temporary failure in name resolution
```
After a bit of research it seems this exception is due to Java being unable to ping $HOSTNAME or a problem with the /etc/hosts file. I experimented locally and had no problem with `echo $HOSTNAME` and even deleted the hosts file (`umount /etc/hosts` then `rm /etc/hosts`) and still could not reproduce that error when starting the Elasticsearch service. It could also be a DNS problem but I have no way of knowing without access to the system/environment. If push comes to shove I'll have to rewrite the Elasticsearch part of the project in Lucene which the library it's based on which will be 2-3 days work.
Would you be able to provide info on the environment you are using to run the images so that we can set up our own VM/server on our side to make debugging these problems easier?
Thanks,
Josh
Hi @JoshReed,
The `docker run` command looks correct. Unfortunately our max_map_count and max_open_files is less. So I'm confused by your numbers, but...
MAX_OPEN_FILES:
```
cat /proc/sys/fs/file-max
3282740
```
MAX_MAP_COUNT:
```
cat /proc/sys/vm/max_map_count
65530
```
Can you not set these values in your code?
Best,
Tom Hi @thomas.yu,
I am currently using `docker run --rm -v ~/Documents/input/:/input/ -v ~/Documents/data/:/data/ -v ~/Documents/output:/output/ docker.synapse.org/syn21752035/first_submission:version1 ` to run the image locally. MAX_OPEN_FILES = 65535, MAX_MAP_COUNT = 262144 are the current values on my local machine which is running macOS Catalina 10.15.3. Hi @JoshReed,
Could you please provide us with the `docker run` command you used to run your image locally?
We do not use any arguments to limit MAX_MAP_COUNT or MAX_OPEN_FILES. Its entirely possible that these variables are tied to the instance / computer that you are running this on and that these values are lower on our instance. What are these values on your server?