According to notice ### In training submission, * /trainingData/*.dcm * /metadata/images_crosswalk.tsv * /metadata/exams_metadata.tsv * /preprocessedData (read-only, present only if preprocessing is specified) * __/modelState (writable; partition of 1 GB, effective size is 976 MB)__ * /scratch (writable; partiton of 200GB) and `/scratch` is a 200GB volume which is empty at the start of training and is cleared (not saved) when training is completed. ### Our Situation In this situation, I think that I should write temporary files including models in `/scratch`, and finally write the result model in `/modelState`. But our team lacks storage space because size of our model is about **4.4GB** in raw and **3.9GB** in compressed by 7z. ### Quastion Q. I wonder if you plan to expand that space or I have another option to save model.

Created by MinHwan Yu minhwan90
I'm not familiar with Torch. Can you been more specific regarding the content of `model _ *. T7` and `optimState _ *. T7`? Why do you have multiplier x3 and x2? By any chance, isn't `optimState` the only file that contains the state/parameters of your trained model? Can you share a reference from the literature where such larger model are used? Thanks!
@tschaffter - Does your 4.4 GB include only your model state or additional checkpoints? The `/modelState` folder is programmed to store only one check point. (The intermediate model for the valid set created forcibly is stored in `/scratch`). - Can you give a reference to an existing deep learning framework and trained model of such size? Our team used Torch framework and create wide-residual-network. The result of the model includes `model _ *. T7` (1.1G) x 3 ` optimState _ *. T7` (557M) x 2 =========== `Total` : 4.4G Do you have any advice or improvements on this? Thanks :)
Your model is much larger than the expected model size suggested by our Dry Run teams (200-500 MB but we allow you to retrieve 1 GB). - Does your 4.4 GB include only your model state or additional checkpoints? - Can you give a reference to an existing deep learning framework and trained model of such size? Thanks!

The available space of /modelState directory page is loading…