Hi, I am trying to run a compatibility test using this command.
mlcube run --task infer data_path= output_path= --gpus 1
But I am getting the error:
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A5000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
0%| | 0/5 [00:06, ?it/s]
Traceback (most recent call last):
File "/mlcube_project/mlcube.py", line 90, in
app()
File "/mlcube_project/mlcube.py", line 80, in infer
run_test(data_path, output_path, parameters, checkpoint_dir)
File "/mlcube_project/run_testing_phase.py", line 38, in run_test
run_inference(
File "/mlcube_project/testing_phase_get_predictions.py", line 117, in run_inference
inputs = sample["image"].cuda() if cuda else sample["image"]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 88) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
I am using docker image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04, and torch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 on CUDA Version: 11.7 on nvidia RTX A5000
Any help would be appreciated. Thanks
Created by Anees Hashmi aneeshashmi @aneeshashmi ,
Thank you for providing the traceback. Looking through the errors, I see:
```text
RuntimeError: DataLoader worker (pid 88) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
```
One possible solution would be to allow for more shared memory to the container run, by adding `gpu_args` to your mlcube.yaml file:
```yaml
docker:
image: docker.synapse.org/.../...
...
gpu_args: --shm-size=2g
```
Hope this helps!
EDIT: add more details
Drop files to upload
MLcube Compatibility test - out of shared memory page is loading…