According to notice
### In training submission,
* /trainingData/*.dcm
* /metadata/images_crosswalk.tsv
* /metadata/exams_metadata.tsv
* /preprocessedData (read-only, present only if preprocessing is specified)
* __/modelState (writable; partition of 1 GB, effective size is 976 MB)__
* /scratch (writable; partiton of 200GB)
`/scratch` is a 200GB volume which is empty at the start of training and is cleared (not saved) when training is completed.
### Our Situation
In this situation, I think that I should write temporary files including models in `/scratch`, and finally write the result model in `/modelState`. But our team lacks storage space because size of our model is about **4.4GB** in raw and **3.9GB** in compressed by 7z.
### Quastion
Q. I wonder if you plan to expand that space or I have another option to save model.
Created by MinHwan Yu minhwan90 I'm not familiar with Torch. Can you been more specific regarding the content of `model _ *. T7` and `optimState _ *. T7`? Why do you have multiplier x3 and x2? By any chance, isn't `optimState` the only file that contains the state/parameters of your trained model? Can you share a reference from the literature where such larger model are used?
Thanks! @tschaffter
- Does your 4.4 GB include only your model state or additional checkpoints?
The `/modelState` folder is programmed to store only one check point. (The intermediate model for the valid set created forcibly is stored in `/scratch`).
- Can you give a reference to an existing deep learning framework and trained model of such size?
Our team used Torch framework and create wide-residual-network. The result of the model includes
`model _ *. T7` (1.1G) x 3
` optimState _ *. T7` (557M) x 2
`Total` : 4.4G
Do you have any advice or improvements on this?
Thanks :) Your model is much larger than the expected model size suggested by our Dry Run teams (200-500 MB but we allow you to retrieve 1 GB).
- Does your 4.4 GB include only your model state or additional checkpoints?
- Can you give a reference to an existing deep learning framework and trained model of such size?
Drop files to upload
The available space of /modelState directory page is loading…