Hi,
I have just had a submitted MLCube fail due to "OSError: [Errno 28] No space left on device", and the apparent resources being utilised are "disk: 1.0 G, and memory: 100.0 M".
I was wondering what the resource constraints are regarding size of the Docker image, MLCube, disk space (for intermediate steps and output folder), and memory resources for inference. Is there a way to specify the requirements we need?
Additionally, is it recommended to package the model weights as an asset for the MLCube or within the Docker image?
Many thanks!
Created by Tim Mulvany timmulvany @timmulvany @mccleve ,
Quick update: the MedPerf devs have confirmed that using `gpu_args` is how you can specify resources to your Docker containers, e.g. `--memory`, `--shm-size`, etc. Hi @timmulvany @mccleve ,
The cloud VMs we will be using to run your MLCubes are _not_ the same as the one we are currently using to accept your MLCube config tarballs, so you will definitely have more resources than the "disk: 1.0 G, and memory: 100.0 M" ?
That being said, let me forward your questions over to the MedPerf team so they can best answer your questions about specifying the resources for your Docker containers. Hello, @timmulvany. I think for my case, the error was referring to system RAM as opposed to VRAM, so your suggestion unfortunately did not solve the error I was getting. I agree that guidelines for specific requirements would be extremely useful though. I also have preprocessing steps included in my mlcube, so required disk space will scale with the amount of test cases provided. Hi,
I encountered similar issues regarding RAM, it was fixed by adding "gpu_args: --shm-size=8g" under the "docker:" section of mlcube.yaml. The default shared memory allowed was 64MB.
It would be helpful to have the resource allowances and some guidance on specifying our specific requirements for inference. We have some intermediate processing steps that with enough test cases will take up more than the 1GB that appears to be allocated.
Many thanks! I am also curious as to the memory constraints of the testing machine and if this needs to be configured in some way in the mlcube.yaml file. While testing on my own machine, I encountered this error https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/inference/data_iterators.py#L111 within mlcube implying that more RAM needs to be allocated. This has never been an issue outside of the mlcube and the machine I am using for testing should have more than enough RAM for this task. I do not have much previous experience with mlcube, but https://docs.mlcommons.org/mlcube/getting-started/mlcube-configuration/ seems to imply that these resources can be configured. Does anyone else have experience with allocating appropriate resources to mlcubes and know what are the appropriate values for the challenge?
Any help would be very much appreciated.