Hi, I am trying to run a compatibility test using this command.
mlcube run --task infer data_path= output_path= --gpus 1
But I am getting the error:
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA RTX A5000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A5000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
0%| | 0/5 [00:06, ?it/s]
Traceback (most recent call last):
File "/mlcube_project/mlcube.py", line 90, in
File "/mlcube_project/mlcube.py", line 80, in infer
run_test(data_path, output_path, parameters, checkpoint_dir)
File "/mlcube_project/run_testing_phase.py", line 38, in run_test
File "/mlcube_project/testing_phase_get_predictions.py", line 117, in run_inference
inputs = sample["image"].cuda() if cuda else sample["image"]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
RuntimeError: DataLoader worker (pid 88) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
I am using docker image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu22.04, and torch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 on CUDA Version: 11.7 on nvidia RTX A5000
Any help would be appreciated. Thanks
Created by Anees Hashmi aneeshashmi @aneeshashmi ,
Thank you for providing the traceback. Looking through the errors, I see:
RuntimeError: DataLoader worker (pid 88) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
One possible solution would be to allow for more shared memory to the container run, by adding `gpu_args` to your mlcube.yaml file:
image: docker.synapse.org/.../...
gpu_args: --shm-size=2g
Hope this helps!
EDIT: add more details
Drop files to upload
MLcube Compatibility test - out of shared memory page is loading…