Hi, I have some TF code loading pre-trained model: tf_saver = tf.train.Saver() tf_saver.restore(session, save_path="some path") The pre-trained model files are uploaded using Docker: COPY model.cpkt.data-00000-of-00001 /model.cpkt.data-00000-of-00001 COPY model.cpkt.index /model.cpkt.index COPY model.cpkt.meta /model.cpkt.meta It took several seconds to load the pre-trained model when I ran this locally. However, it timed out when I tried to run this on Express Training Lane. An example would be submission id 8019899. There is no exception / error on the log file I received. I am wondering if I missed anything if I want to load model dynamically from Docker Image. Thanks for your help!

Created by Yiqiu Shen ashen
Problem is solved by changing the TensorFlow version on Docker image from 11 to 12.
I waited in normal training lane for several hours and received a Exception which I did not see in Express Lane logs. Seems like there is some issue with TensorFlow and the os-level infrastructure. The exception is: STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1340, in restore STDERR: if not file_io.get_matching_files(file_path): STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 231, in get_matching_files STDERR: compat.as_bytes(filename), status)] STDERR: File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__ STDERR: self.gen.next() STDERR: File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/errors.py", line 463, in raise_exception_on_not_ok_status STDERR: pywrap_tensorflow.TF_GetCode(status)) STDERR: tensorflow.python.framework.errors.NotFoundError: /sys/dev/block/7:0/subsystem/dm-809 Hi @tschaffter, @brucehoff I think I might miss some Docker/server specific code about restoring a session in TensorFlow. Do the organizers have any example code available that restores model from /modelState or Docker image? Thanks!

TensorFlow Code For Loading Pre-Trained Model Timed out in Express Lane page is loading…